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ABSTRACT 


This  Handbook  describes  the  practice  of  Federal  research 
impact  assessment  (RIA) .  It  describes  research  impact  evaluation 
for  research  selection,  review,  and  ex-post  assessment.  It 
describes  retrospective  methods  (such  as  projects  Hindsight  and 
TRACES) ,  qualitative  methods  (such  as  peer  review) ,  and 
quantitative  methods  (such  as  cost-benefit  analysis  and 
bibliometrics) . 

The  Handbook  is  structured  as  follows.  Section  I,  Title/ 
Author/  Abstract/  Background,  contains  the  rationale  for  the 
contents  of  this  Handbook.  Section  II,  Overview/Executive  Summary 
of  Handbook,  describes  general  problems  and  promises  of  RIA,  then 
provides  an  Executive  Summary  of  the  Handbook.  Section  III, 
Introduction,  is  the  first  section  of  the  main  body  of  the  Handbook 
and  shows  the  importance  of  the  topic. 

Section  IV,  Research  Impact  Evaluation  Techniques,  describes 
and  critiques  the  different  methods  used  to  assess  research  impact. 
Section  IV-A  describes  qualitative  methods,  including  peer  review 
problems  and  principles  for  quality  peer  reviews,  peer  review 
processes  for  proposed  programs/projects  (National  Science 
Foundation,  National  Institutes  of  Health,  Office  of  Naval 
Research,  Dutch  STW) ,  and  peer  review  processes  for  existing 
programs/projects  (Department  of  Energy  Office  of  Basic  Energy 
Sciences,  Office  of  Naval  Research,  National  Institute  of  Standards 
and  Technology,  Army  Research  Laboratory,  Department  of  Energy 
National  Laboratories) .  Section  IV-B  describes  semi-quantitative 
methods,  including  Project  Hindsight,  three  TRACES  studies,  and 
accomplishment  books  from  the  Office  of  Naval  Research,  Air  Force 
Office  of  Scientific  Research,  Department  of  Energy  Office  of 
Health  and  Environmental  Research,  Department  of  Energy  High  Energy 
Physics  Program,  and  Advanced  Research  Projects  Agency.  Section 
IV-C  describes  quantitative  methods  including  bibliometrics,  cost- 
benefit/  economic  analyses,  cost-efficiency  analysis,  co-occurrence 
phenomena,  network  modeling  for  direct/  indirect  impacts,  roadmaps 
for  science  and  technology  evolution,  and  expert  networks. 

Section  V,  Recommended  Areas  of  Research  for  RIA,  contains 
recommended  topics  to  be  pursued  which  would  advance  RIA.  Section 
VI,  Research  Impact  Assessment  -  Summary  and  Conclusions, 
summarizes  the  results  of  this  Handbook.  Section  VII,  RIA  Options 
for  Research  Sponsoring  Organizations,  is  a  self-contained 
description  of  the  different  types  of  research  evaluations 
recommended  for  organizations.  While  the  focus  is  on  Federal 
agencies,  the  principles  and  mechanics  of  implementation  are  valid 
for  non-Federal  organizations  as  well.  Section  VIII,  Analysis  of 
RIA  Literature,  provides  a  quantitative  and  qualitative  analysis  of 
the  published  RIA  literature;  section  IX,  Bibliography,  contains  an 
alphabetically-ordered  list  of  the  >400  references  that  were  used 
for  this  Handbook;  section  X,  Suggested  Further  Reading,  identifies 
about  3100  papers  which  provide  additional  background  and  context 
for  research  evaluation;  and  section  XI,  Most  Highly  Cited  Papers, 
contains  a  list  of  the  450  most  highly  cited  papers  in  RIA. 


2 


TABLE  OF  CONTENTS 


I .  BACKGROUND 

II.  OVERVIEW/  EXECUTIVE  SUMMARY  OF  HANDBOOK 
OVERVIEW 

EXECUTIVE  SUMMARY 

III.  INTRODUCTION 

IV.  RESEARCH  IMPACT  EVALUATION  TECHNIQUES 
IV-A.  QUALITATIVE  METHODS  (PEER  REVIEW) 

IV-B.  SEMI-QUANTITATIVE  METHODS 

IV-C.  QUANTITATIVE  METHODS 
BIBLIOMETRICS 

COST-BENEFIT/  ECONOMIC  ANALYSES 
COST-EFFICIENCY 
CO-OCCURRENCE  PHENOMENA 
DATABASE  TOMOGRAPHY 
NETWORK  MODELING/  ROADMAPS 
EXPERT  NETWORKS 

V.  RECOMMENDED  AREAS  FOR  RESEARCH  IN  RIA 

SEMI-QUANTITATIVE  METHODS 

PEER  REVIEW 

QUANTITATIVE  METHODS 

DATABASE  INFRASTRUCTURE  DEVELOPMENT 

GENERAL 

VI.  RIA  -  SUMMARY  AND  CONCLUSIONS 

VII.  RIA  OPTIONS  FOR  RESEARCH  SPONSORING  ORGANIZATIONS 
SPECIFIC  RECOMMENDATIONS  FOR  AGENCY  RES.  EVAL.  GUIDANCE 
ATTACHMENT  1  -  REQUIREMENTS  FOR  VERTICAL  INTEGRATION 

2  -  CORPORATE  INVESTMENT  STRATEGY 

3  -  DESIRABLE  CHARACTERISTICS  OF  QUALITY  PEER  REVIEW 

4  -  REVIEW  PROTOCOL  FOR  SUCCESSFUL  PEER  REVIEW 

5  -  PROTOCOL:  VERT.  INTEGRATED  PROGRAM  RELEVANCE  ASSESSMENT 

6  -  REVIEW  PANEL  SELECTION  APPROACHES 

7  -  ASSESSMENT  ISSUES  FOR  PRESENTATIONS 

8  -  RESEARCH  PRODUCT  EVOLUTION  TRACKING  DATABASE 

9  -  SAMPLE  GUIDANCE  FOR  QUALITY/RELEVANCE  PROGRAM  REVIEW 

10  -  ESTIMATE  OF  PEER  REVIEW  COST 

11  -  DOE  PROCEDURES  FOR  PEER  REVIEW  ASSESSMENTS 

12  -  EVALUATION  FORMS  FOR  EXISTING  PROGRAMS 

13  -  EVALUATION  CRITERIA  FOR  EXISTING  PROGRAMS 

14  -  EVALUATION  FORMS  FOR  PROPOSED  PROGRAMS 

15  -  EVALUATION  CRITERIA  FOR  PROPOSED  PROGRAMS 

16  -  IDENTIFYING  KEY  REVIEWER  CRITERIA 

17  -  TECHNICAL/PROGRAMMATIC  ISSUES  FOR  PROGRAM  REVIEW 

18  -  REVIEW  PROTOCOL  FOR  SMALL  SEED  MONEY  PROJECTS 

19  -  USE  OF  PUBLISHED  PAPERS  IN  RESEARCH  EVALUATION 

20  -  FUNDS  ALLOCATIONS  BASED  ON  REVIEWERS'  SCORES 

21  -  POTENTIAL  USE  OF  ENTROPY  IN  RESEARCH  EVAL. 

VIII.  ANALYSIS  OF  RIA  LITERATURE 

IX.  BIBLIOGRAPHY 

X.  SUGGESTED  FURTHER  READING 

XI.  MOST  HIGHLY  CITED  PAPERS  IN  RIA  LITERATURE 


3 


I .  BACKGROUND 


In  research  sponsoring  organizations,  the  selection  and 
continuation  of  research  programs  must  be  made  on  the  basis  of 
outstanding  science  and  potential  contribution  to  the 
organization's  mission.  There  have  been  increasing  pressures  to 
link  science  and  technology  programs  and  goals  even  more  closely 
and  clearly  to  organizational  as  well  as  broader  societal  goals. 
This  is  reflected  in  a  number  of  studies  [Brown,  1992;  NAS,  1992; 
Carnegie,  1992],  in  the  controversial  National  Institutes  of  Health 
strategic  planning  process,  in  the  controversial  statements  by  the 
previous  National  Science  Foundation  director  about  closer 
alignment  with  industry  and  other  government  agencies,  and  in 
conversations  with  numerous  government  officials. 

In  tandem  with  the  pressures  for  more  strategic  research  goals 
are  motivations  to  increase  research  assessments  and  reporting 
requirements  to  insure  that  the  increasingly  strategic  research 
goals  are  being  pursued  by  proposed  and  existing  research  programs. 
The  1992  Congressional  Task  Force  report  on  the  health  of  research 
[Brown,  1992]  stated,  as  one  of  its  two  recommendations:  "Integrate 
performance  assessment  mechanisms  into  the  research  process  using 
legislative  mandates  and  other  measures,  to  help  measure  the 
effectiveness  of  Federally  funded  research  programs". 

The  Government  Results  and  Performance  Act  of  1993  (Public  Law 
103-62)  was  passed  on  August  3,  1993.  This  Act  provides  for  the 
establishment  of  strategic  planning  and  performance  measurement  in 
the  Federal  government,  and  for  other  purposes.  Not  only  will  the 
Federal  agencies  be  required  to  establish  performance  goals  for 
program  activities,  but  as  the  law  states,  they  will  be  required  to 
establish  performance  indicators  to  be  used  in  measuring  or 
assessing  the  relevant  outputs,  service  levels,  and  outcomes  of 
each  program  activity. 

A  pilot  program  was  established  to  identify  appropriate 
measures  and  procedures  that  could  be  applied  to  different  agencies 
and  different  types  of  programs,  and  would  satisfy  the  GPRA 
requirements.  Some  strengths  and  weaknesses  of  the  process  as 
applied  to  R&D  have  surfaced  already  [Brown,  1996] .  A  recent  paper 
in  Science  [Kostoff ,  I997h]  identified  potential  problems  for  basic 
research  if  the  GPRA  metrics  are  used  as  the  main  performance 
indicators.  This  paper  proposed  an  alternate  approach  for 
evaluating  the  progress  and  performance  of  basic  research. 

Due  to  increased  world  competition,  and  the  trends  toward 
corporate  downsizing,  parallel  pressures  exist  for  industrial 
research  organizations  to  link  research  programs  more  closely  with 
strategic  corporate  goals  and  to  increase  research  performance  and 
productivity.  In  tandem  with  the  increasing  governmental  interests 
in  research  assessment  stated  above,  there  is  considerable 
industrial  interest  in  research  assessment  as  well.  As  an  example, 
the  Industrial  Research  Institute  (IRI) ,  whose  260  member  companies 
invest  over  $55  billion  annually  in  R&D,  has  shown  intense  interest 
in  measuring  research  performance  and  effectiveness.  The  IRI  has 
commissioned  one  of  its  internal  panels  (headed  by  Dr.  James  W. 
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Tipping)  to  research  the  field  and  write  a  position  paper  on 
measuring  and  improving  effectiveness  of  RSeD  on  company 
performance.  According  to  Dr.  Tipping,  two  roundtables  on  this 
subject  have  been  held.  They  have  been  oversubscribed  but  limited 
to  50  companies  [Tipping,  1993]. 

When  the  above  activities  are  integrated  and  placed  into  a 
mosaic,  the  inescapeable  trend  for  the  future  becomes  clear.  The 
research  sponsoring  agencies  will  become  more  accountable  to  the 
Administration  and  Congress  on  the  relationship  between  sponsored 
programs  and  strategic  goals,  and  soon  thereafter  the  research 
performers  will  become  more  accountable  to  the  sponsoring  agencies. 
In  addition,  the  accountability  of  industrial  research  to  the 
broader  corporate  goals  will  increase  (as  has  been  observed  over 
the  past  decade) ,  and  improved  methods  of  measuring  research 
performance  and  productivity  will  be  sought  continually  by 
industrial  research  organizations.  It  is  important  that  research 
managers  and  administrators  in  government,  industry,  and  academia 
understand  the  assessment  approaches  which  could  be  utilized  to 
evaluate  research  quality  and  goal  relevance,  and  that  researchers 
gain  an  understanding  of  these  evaluation  approaches  as  well. 

In  the  Congressional  Task  Force  report  on  the  health  of 
research  [Brown,  1992]  mentioned  above,  the  authors  recognized  the 
difficulty  of  integrating  performance  assessment  mechanisms  into 
the  research  process.  In  addressing  the  difficulty  of  implementing 
this  recommendation,  the  report  stated  further:  "More  daunting  than 
political  resistance  to  performance  assessment  are  the  technical 
obstacles.  Because  policy-oriented  assessment  has  not  been  a  part 
of  the  research  process  in  the  past,  its  implementation  must  be 
both  gradual  and  flexible.  There  are  some  initial  efforts 
underway” .  The  reference  in  the  Task  Force  report  for  these 
'initial  efforts'  [Kostoff,  1992a]  is  the  text  of  a  presentation  by 
the  author  at  the  Third  International  Conference  on  Management  of 
Technology. 

The  present  Handbook  integrates  and  updates  the  results  from 
Kostoff  [1992a]  and  subsequent  studies  [Kostoff,  1992d,  1993a-g, 
1994a-l,  1995a-e,  1996a-c,  1997a-q;  Odeyale  and  Kostoff,  1994a-c; 
Zurcher  and  Kostoff,  1997]  concerned  with  research  impact 
assessment  (RIA) .  The  front  part  starts  by  identifying  the  many 
facets  of  research  impact,  then  focuses  on  strengths  and  weaknesses 
of  selected  major  techniques  used  in  practice  by  the  Federal 
government  to  assess  research  impact.  It  ends  by  identifying 
promising  research  opportunities  for  advancing  the  field  of  RIA. 

II.  OVERVIEW/EXECUTIVE  SUMMARY  OP  HANDBOOK 

OVERVIEW 

UNDERUTILIZATION  OF  RIA 

Research,  the  pursuit  and  production  of  knowledge,  has  become 
a  substantial  investment  in  the  U.  S.  and  the  rest  of  the  developed 
world  today.  Depending  on  what  is  defined  specifically  as  research 
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in  practice,  public  and  private  investment  in  research  in  the  U.  S. 
alone  amounts  to  tens  of  billions  of  dollars  per  year.  In  1990, 
for  example.  Federal  support  for  basic  and  applied  research 
approximated  $22B,  about  47  percent  of  total  support  for  research 
in  the  U.S.  [OTA,  1991].  Typically,  with  investments  of  this 
magnitude,  project  selection  and  management  are  performed  using  the 
latest  techniques  available.  Project  payoff  is  estimated  using  the 
latest  techniques  and  algorithms  available.  In  addition, 
assessments  of  a  large  magnitude  investment  are  done  on  a 
continuing  basis,  and  there  is  a  continual  feedback  loop  to  assure 
the  investment  will  achieve  its  goals  and  targets. 

While  the  methods  used  in  the  performance  of  research 
continually  advance  the  state-of-the-art,  the  methods  used  for  its 
identification  and  selection  have  changed  little  in  decades.  In 
evaluation  and  assessment  of  existing  and  completed  research,  not 
only  have  the  methods  in  practice  changed  little  with  time,  but  the 
numbers  of  organizations  which  use  any  but  the  most  rudimentary 
methods  also  remain  a  handful.  While  the  scientific  and  social 
science  literatures  abound  with  advanced  methodologies  for 
identifying  and  selecting  new  research,  managing  existing  research, 
and  evaluating  and  assessing  research  retrospectively,  the 
implementation  of  these  methods  by  the  research  sponsoring 
community  remains  minimal. 

REASONS  FOR  UNDERUTILIZATION  OF  RIA 

The  reasons  for  reluctance  to  implement  RIA  vary  [Kostoff, 
1994f ] .  The  rewards  in  research  and  research  management  go  to  new 
discoveries,  not  for  quality  assessments.  Neither  the  costs  nor 
time  requirements  of  RIA  are  negligible,  and  have  to  be  weighed 
against  additional  research  which  could  be  performed.  More 
immediate  organizational  rec[uirements  are  assigned  higher  priority 
than  RIA.  For  example,  an  OTA  assessment  of  the  defense  technology 
base  states:  "OSD  [Office  of  the  Secretary  of  Defense-RNK] 
personnel  spend  a  large  part  of  their  time  defending  technology 
base  programs  or  answering  congressional  mail,  leaving  little  time 
available  to  evaluate  technology  base  programs”  [OTA,  1989]. 

The  RIA  outcomes  are  not  always  predictable  or  positive  from 
a  micro  viewpoint,  and  'pet'  projects  may  be  terminated  after  a 
rigorous  evaluation.  Any  negative  results  from  an  RIA  may  provide 
executive  or  legislative  branch  overseers,  or  corporate  management, 
ammunition  for  budget  reductions.  Finally,  since  there  is  very 
little  experience  with  use  of  advanced  evaluation  techniques,  there 
is  insufficient  evidence  at  present  that  use  of  advanced  evaluation 
techniques  will  result  in  better  payoff  than  use  of  rudimentary 
techniques.  To  many  research  managers  and  administrators,  there  is 
little  to  be  gained  from  RIA,  and  a  potential  for  loss. 

BENEFITS  OF  INCREASED  UTILIZATION  OF  RIA 

However,  with  the  ascendency  of  Total  Quality  Management  in 
many  organizations,  and  with  decreasing  budgets  and  increased 
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competitiveness  at  many  levels,  the  motivation  for  a  better 
understanding  of  the  quantitative  and  qualitative  measures  of 
research  impact  has  escalated  in  importance.  Motivation  to 
incorporate  RIA  into  a  permanent  component  of  an  organization's 
mode  of  operation,  and  determination  to  use  the  latest 
technological  advances  consistent  with  an  organization's  RIA 
requirement  could  have  significant  consequences  at  the 
organizational  and  national  levels. 

One  major  benefit  would  be  to  improve  organizational 
efficiency.  A  properly  executed  RIA  would  target  the  people  and 
the  exogenous  variables  (management  climate,  funding  conditions, 
infrastructure,  etc.)  necessary  to  increase  research  output 
relevant  to  the  organization's  goals.  An  RIA  which  increased 
communication  among  the  researchers  and  potential  research 
customers  during  the  conduct  of  research  would  allow  a  smoother 
conversion  of  the  products  of  research  to  technology  through  better 
integration  of  the  users  with  the  research  performers. 

Another  major  benefit  would  be  to  identify  the  diverse  impacts 
of  basic  research.  The  impacts  of  basic  research  are  pervasive 
throughout  a  technological  society,  but  for  the  most  part  the 
impacts  of  basic  research  are  indirect  on  technologies,  systems, 
and  end  products.  A  major  limitation  of  articulating  the  benefits 
of  basic  research  has  been  the  lack  of  data  which  could  show  the 
pathways  and  linkages  through  which  the  research  impacts  the 
intermediate  or  end  products.  A  credible  RIA  of  completed  research 
would  trace  the  dissemination  of  the  research  products  through  the 
many  communication  channels  and  would  identify  the  multitude  of 
near  and  long  term  research  impacts  (impact  on  other  research 
fields,  impact  on  technology,  impact  on  systems,  impact  on 
education,  etc.) .  Having  this  data  would  provide  more  substantive 
arguments  for  continuing  to  provide  the  necessary  funds  to  those 
who  control  the  allocation  of  research  funds. 

RECENT  RIA  STUDIES 

One  objective  of  the  author's  recent  studies  and  the  present 
Handbook  is  to  identify  many  of  the  advanced  and  credible  RIA 
approaches  in  use,  or  available  today,  and  to  enumerate  both  their 
strengths  and  weaknesses.  Since  research  impact  has  many  facets, 
its  assessment  must  use  as  many  methods  and  as  many  types  of 
experts  as  required  to  address  as  many  of  these  components  as 
possible.  Credible  assessments  will  then  weight  the  results  of  the 
different  facet  assessments  relative  to  the  different 
organizational  goals,  and  arrive  at  conclusions  optimal  to  the 
organization's  interests. 

Combinations  of  RIA  approaches  are  recommended  when  performing 
a  full  assessment.  While  the  readers  schooled  in  systems 
reliability  may  question  how  the  results  from  multiple  imperfect 
approaches  are  improved  as  the  number  of  approaches  increase, 
experience  has  shown  that  a  more  acceptable  product  does  result 
when  different  approaches  are  used.  The  effect  appears  to  be 
additive  rather  than  multiplicative.  When  different  RIA  approaches 
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result  in  similar  findings,  the  user  will  have  confidence  in  the 
general  theme  of  the  results.  When  different  approaches  produce 
conflicting  results,  much  value  and  understanding  is  gained  by 
trying  to  understand  the  causes  of  the  differences  and  trying  to 
then  resolve  these  differences. 

Another  objective  of  the  recent  studies  and  this  Handbook  is 
to  show,  somewhat  indirectly,  that  while  there  is  a  significant  gap 
between  the  RIA  methods  available  in  the  literature  and  the  RIA 
methods  actually  in  use,  there  is  also  a  substantial  gap  between 
the  technologies  becoming  available  from  the  research  laboratories 
(such  as  information  management  and  processing)  and  the 
technologies  employed  in  the  pviblished  methods.  In  the  U.  S., 
Federal  support  for  developing  the  assessment  methodologies  which 
use  the  latest  technologies  has  lagged  other  parts  of  the  world. 
A  cursory  reading  of  the  relevant  literature  shows  that  in  the  past 
two  decades  the  U.S.  efforts  in  this  field  have  advanced  at  a  very 
slow  pace,  and  in  many  subfields  the  U.  S.  has  been  surpassed  by 
other  nations,  notably  those  of  Western  Europe.  If  it  is  assumed 
that  improved  RIA  will  lead  to  a  more  efficient  allocation  of 
research  resources,  then  in  the  highly  competitive  research  and 
technology  based  world  which  has  evolved,  the  U.  S.  cannot  afford 
to  continue  business  as  usual  in  its  treatment  of  research  and  its 
impacts.  It  is  hoped  that  this  Handbook  will  help  spur  the  Federal 
government,  and  private  sources  as  well,  to  focus  a  concerted 
effort  in  advancing  the  techniques  and  implementation  of  RIA. 

The  first  part  of  this  Handbook  is  divided  into  three 
segments,  which  range  from  qualitative  to  quantitative  approaches. 
The  first  segment  deals  with  qualitative  approaches  to  RIA. 
Foremost  among  these  are  variants  on  the  common  theme  of  peer 
review.  While  peer  review  (evaluation  of  research  and  its 
consequences  by  'peers',  or  experts  on  the  different  facets,  of 
research  and  its  impacts)  is  the  method  used  most  widely  to 
evaluate  research,  it  has  its  detractors,  as  will  be  shown  in  this 
Handbook.  Because  of  cost  and  subjectivity,  other  methods  to 
complement  or  replace  peer  review,  and  which  are  perhaps  less 
costly  and  more  objective,  are  being  actively  pursued. 

The  second  segment  deals  with  semi-quantitative  approaches. 
These  methods  make  little  use  of  mathematical  tools  but  attempt  to 
draw  on  documented  approaches  and  results  wherever  possible.  They 
have  limited  credibility  in  the  analytic  community,  since  the 
selection  of  innovations  to  be  analyzed  tends  to  be  arbitrary 
rather  than  mathematically  rigorous,  and  they  are  viewed  more  as 
anecdotal  approaches  than  serious  technical  approaches . 
Nevertheless,  in  practice,  some  of  these  approaches  (namely, 
studies  of  accomplishments  resulting  from  sponsored  research 
programs,  or  studies  of  systems  and  the  research  products  which 
were  eventually  converted  and  incorporated  into  those  systems)  are 
widely  used  by  the  research  sponsoring  organizations. 

The  third  segment  deals  with  the  quantitative  and  fiscal 
approaches  to  RIA.  These  approaches  make  heavy  use  of  mathematical 
and  analytic  tools,  and  utilize  computer  capabilities  extensively. 
Probably  the  heaviest  concentration  of  literature  papers  today  are 
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in  this  category.  It  should  be  noted  that  there  are  hybrid 
techniques  which  span  more  than  one  of  the  three  categories.  For 
example,  a  retrospective  study  of  significant  events  in  cancer 
research  [Narin,  1989]  included  a  bibliometric  component  (citation 
and  co-citation  analyses) . 

EXECUTIVE  SUMMARY 

GENERAL  PRINCIPLES  AND  CONCLUSIONS 

There  are  some  general  principles,  findings,  and  conclusions 
when  the  different  methods  described  in  this  Handbook  and  their 
results  are  integrated  and  interpreted.  First  and  foremost  is  the 
role  of  motivation  and  associated  incentives.  The  research 
managers  and  administrators,  and  those  with  responsibility  for 
higher  level  oversight,  have  to  be  convinced  of  the  value  of  RIA  to 
their  organizations  for  the  improved  allocation  of  research 
resources.  More  important  than  any  evaluation  criteria  selected  is 
the  dedication  of  an  organization's  management  to  the  highest 
quality  objective  review,  and  the  associated  emplacement  of  rewards 
and  incentives  to  encourage  quality  reviews.  The  team  assigned 
responsibility  to  carry  out  RIA  must  be  motivated  to  generate  the 
highest  quality  product,  not  just  'answer  the  mail',  as  is  done  in 
many  organizations  today.  This  means  selecting  the  best  suite  of 
methods  available  to  accomplish  organizational  objectives,  a:nd 
selecting  the  most  competent  and  objective  individuals  to 
participate  in  the  RIA.  The  RIA  managers  must  be  motivated  to 
examine  the  impact  from  as  many  perspectives  as  possible,  to  gain 
the  most  complete  understanding.  Finally,  the  objectives, 
importance,  and  benefits  of  RIA  must  be  articulated  and 
communicated  to  the  researchers  and  research  managers  at  the 
initiation  of  RIA,  so  that  the  reviewees  will  participate  in  the 
RIA  as  fully  and  as  cooperatively  as  possible. 

The  total  R&D  process  in  an  organization  should  be  designed  to 
include  RIA  as  an  integral  component,  not  as  an  afterthought  or  an 
add-on.  This  will  allow  an  orderly  and  continuous  monitoring  of 
the  full  research  selection,  review,  and  post-mortem  analysis 
process,  and  insure  that  the  best  research  consistent  with  the 
organization's  goals  is  being  funded.  The  evaluation  methods 
selected  should  not  be  overly  complex  or  require  massive  permanent 
staffs,  and  should  offer  minimum  interference  in  the  performance  of 
the  research  [Robb,  1994].  Most  managers  regard  applying  overly 
elaborate  and  rigorous-seeming  techniques  to  industrial  R&D  as 
inappropriate  [Nelson,  1994].  A  reasonable  fraction  of  the  R&D 
budget  should  be  allocated  for  RIA  purposes,  and  advancement  along 
a  career  path  for  RIA  professionals  should  parallel  that  of  the 
research  performers. 

An  RIA  should  be  conducted  with  maximum  access  to,  and 
awareness  of,  information  about  research  and  technology  development 
being  pursued  throughout  the  world.  Access  of  the  RIA  to  existing 
technology  information  would  also  be  useful.  This  information  will 
help  determine  whether  the  research  being  assessed  is  breaking  new 
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ground  and,  for  high-tech  organizations,  whether  the  research  being 
assessed  is  improving  existing  or  developing  technology. 

Optimally,  a  database  which  contains  this  information  would  be 
available  to  those  conducting  an  RIA.  Over  the  past  five  years, 
substantial  progress  has  been  made  in  developing  a  database  of 
federally-sponsored  science  and  technology  development.  In  1991, 
the  author  developed  a  Federal  multiagency  funded  research  programs 
database  which  contained  narrative  descriptions  of  90,000  projects. 
Over  the  past  two  years,  the  RADIUS  database  has  been  developed  by 
RAND-CTI.  This  database  contains  narrative  descriptions  of  over 
200,000  projects,  and  describes  federal  agency  S&T  at  five 
different  hierarchical  levels.  However,  a  comprehensive  research 
and  (developing  and  existing)  technology  database  that  incorporates 
government  and  industry  programs,  both  domestic  and  foreign, 
remains  to  be  developed.  Construction  of  such  a  database  would 
require  cooperation  among  Federal  research  sponsoring  agencies  and 
private  organizations,  domestic  and  foreign,  at  a  minimum. 

For  organizations  which  sponsor  siabstantial  basic  research, 
the  RIA  should  be  structured  to  identify  impacts  which  occur  many 
decades  after  the  research  is  performed.  The  reasons  for  this  are 
twofold.  First,  the  impacts  of  basic  research  on  organizational 
missions  such  as  systems  and  operations  can  take  decades  before 
they  are  realized.  Second,  these  organizational  mission  impacts 
will  provide  data  for  predictive  models  that  relate  research 
evaluation  results  to  organizational  mission  impacts.  Also,  the 
indirect  impacts  of  the  research  must  receive  a  proper  accounting. 
These  indirect  impacts  contribute  to  an  ever  expanding  pool  of 
knowledge,  and  it  is  the  level  of  this  pool  which  serves  as  the 
critical  path  to  limiting  the  rate  of  advance  of  mission-oriented 
research,  and  thereby  technology  and  systems  growth.  While  the 
determination  of  indirect  impacts  is  complex  and  data  intensive,,  it 
is  absolutely  necessary  for  a  credible  RIA. 

The  present  Handbook  addresses  the  predictive  reliability  of 
the  RIA  processes  very  briefly,  mainly  because  there  is  little 
literature  which  provides  the  basis  for  predicting  which  research 
programs/proposals  will  have  the  desired  downstream  impact.  For 
example,  the  relationship  between  a  proposal's  peer  review  score  or 
a  project's  bibliometric  rating  and  the  downstream  impact  on  an 
organization's  mission  is  not  addressed  in  published  studies, 
although  some  initial  efforts  have  been  initiated  [Van  den  Beemt, 
1991] .  One  could  raise  the  question,  as  many  active  researchers 
have,  as  to  whether  there  is  value  to  any  of  these  assessment 
techniques,  since  their  predictive  value  is  unknown.  The 
credibility  and  predictability  of  these  assessment  techniques  are 
ripe  topics  for  research.  A  long  term  tracking  system  for  research 
product  evolution  would  be  required  to  gather  the  necessary  data. 
The  system  would  require  agreement  and  coordination  from  a  number 
of  the  larger  Federal  research  sponsoring  agencies,  and  maybe  from 
industrial  organizations  as  well.  While  such  a  system  would  not 
provide  absolute  answers,  since  tracking  of  the  informal  modes  of 
knowledge  communication  would  be  almost  impossible,  it  would 
provide  a  much  better  picture  of  research  impact  and  its 
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predictability  than  exists  now.  With  the  present  state  of 
information  storage  and  processing  capabilities,  research  product 
evolution  tracking  is  an  idea  whose  time  has  come. 

PEER  REVIEW  SUMMARY 

Peer  review  of  research  represents  evaluation  by  experts  in 
the  field,  and  is  the  method  of  choice  in  practice  in  the  U.  S. 
[Salasin,  1980;  Logsdon,  1985;  Chubin,  1990,  1994;  Kostoff,  1993b]. 
Its  objectives  range  from  being  an  efficient  resource  allocation 
mechanism  to  a  credible  predictor  of  research  impact. 

Requirements  for  High  Quality  Peer  Review 

Many  studies  related  to  peer  review  have  been  reported  in  the 
literature,  ranging  from  the  mechanics  of  conducting  a  peer  review, 
to  examples  of  peer  reviews,  to  detailed  critiques  of  peer  reviews 
and  the  process  itself  (e.g..  Barker  [1992],  Chubin  [1990,  1994], 
Cicchetti  [1991],  Cole  [1978,  1981a,  1981b],  Cozzens  [1987],  DOD 
[1987],  DOE  [1982,  1993],  Frazier  [1987],  Kostoff,  [1988],  Logsdon, 
[1985];  Ormala,  [1989];  OTA,  [1986];  Salasin,  [1980],  and  Nicholson 
[1987]).  A  non-standard  peer  review  approach  for  concept 
comparisons  is  the  Science  Court.  As  in  a  legal  procedure,  it  has 
well  defined  advocates,  critics,  a  jury,  etc.  It  was  applied  by 
the  author  to  a  review  of  alternate  fusion  concepts  in  1977  [DOE, 
1978].  This  procedure  had  substantial  debate  and  surfacing  of 
crucial  issues,  but  it  was  time-consuming  compared  to  a  standard 
panel  assessment. 

While  these  reported  studies  present  the  process  mechanics, 
the  procedures  followed,  and  the  review  results,  the  reader  cannot 
ascertain  the  quality  of  the  review  and  the  results.  In  practice, 
procedure  and  process  quality  are  mildly  necessary,  but  nowhere 
sufficient,  conditions  for  generating  a  high  quality  peer  review. 
Many  useful  peer  reviews  have  been  conducted  using  a  broad  variety 
of  processes,  and  while  well  documented  modern  processes  (e.g.,  DOE 
[1993])  may  contribute  to  the  efficiency  of  conducting  a  review, 
more  than  process  is  needed  for  high  quality.  There  are  many 
intangible  factors  that  enter  into  a  high  quality  review,  and 
before  examples  of  reviews  are  presented  in  the  main  body  of  this 
Handbook,  some  of  the  more  important  factors  will  be  discussed. 

The  desirable  characteristics  of  a  peer  review  can  be 
summarized  as  [Chubin,  1994]: 

1.  an  effective  resource  allocation  mechanism; 

2.  an  efficient  resource  allocator; 

3.  a  promoter  of  science  accountability; 

4.  a  mechanism  for  policymakers  to  direct  scientific  effort; 

5.  a  rational  process; 

6.  a  fair  process; 

7.  a  valid  and  reliable  measure  of  scientific  performance. 

High  quality  peer  reviews  require  as  a  minimum  the  conditions 
summarized  from  Ormala  [1989]: 
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1.  The  method,  organization  and  criteria  for  an  evaluation 
should  be  chosen  and  adjusted  to  the  particular  evaluation 
situation; 

2.  Different  levels  of  evaluation  require  different  evaluation 
methods ; 

3.  Program  and  project  goals  are  important  considerations  when 
an  evaluation  study  is  carried  out; 

4.  The  basic  motive  behind  an  evaluation  and  the  relationships 
between  an  evaluation  and  decision  making  should  be  openly 
communicated  to  all  the  parties  involved; 

5.  The  aims  of  an  evaluation  should  be  explicitly  formulated; 

6.  The  credibility  of  an  evaluation  should  always  be  carefully 
established; 

7.  The  prerequisites  for  the  effective  utilization  of 
evaluation  results  should  be  taken  into  consideration  in  evaluation 
design. 

Assuming  these  considerations  have  been  taken  into  account, 
three  of  the  most  important  intangible  factors  for  a  successful 
peer  review  are:  Motivation,  Competence,  and  Independence.  The 

review  leader's  motivation  to  conduct  a  technically  credible  review 
is  the  cornerstone  of  a  successful  review.  The  leader  selects  the 
reviewers,  summarizes  their  comments,  guides  the  questions  and 
discussions  in  a  panel  review,  and  makes  recommendations  about 
whether  the  proposal  should  be  funded.  The  quality  of  a  review 
will  never  go  beyond  the  competence  of  the  reviewers.  Two 
dimensions  of  competence  which  should  be  considered  for  a  research 
review  are  the  individual  reviewer's  technical  competence  for  the 
subject  area,  and  the  competence  of  the  review  group  as  a  body  to 
cover  the  different  facets  of  research  issues  (other  research 
impacts,  technology  and  mission  considerations  and  impacts, 
infrastructure,  political  and  social  impacts) .  The  quality  of  a 
review  is  limited  by  the  biases  and  conflicts  of  the  reviewers. 
The  biases  and  conflicts  of  the  reviewers  selected  should  be  known 
to  the  leader  and  to  each  other. 

A  broad  range  of  reviewer  expertise  enhances  the  review 
results  substantially.  A  key  component  of  the  process  reported  in 
Kostoff  [1988]  was  the  use  of  mixed  levels  of  reviewers  on  the 
panels  to  evaluate  the  different  potential  impacts  of  research. 
The  panels  included: 

1.  bench-level  researchers  to  address  the  impact  of  the 
proposed  research  on  its  field; 

2 .  broad  research  managers  to  address  potential  impact  on 
allied  research  fields; 

3 .  technologists  to  address  potential  impact  on  technology  and 
the  potential  of  the  research  to  transition  to  higher  levels  of 
development ; 

4.  systems  specialists  to  address  potential  impact  on  systems 
and  hardware; 

5.  operational  naval  officers  to  address  the  potential  impact 
on  naval  operations. 
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The  presence  of  reviewers  with  different  research  target 
perspectives  and  levels  of  understanding  on  one  panel  provided  a 
depth  and  breadth  of  comprehension  of  the  different  facets  of  the 
research  impact  that  could  not  be  achieved  by  segregating  the 
science  and  utility  components  into  separate  panels  and 
discussions. 

Nearer-term  research  impacts  typically  play  a  more  important 
role  in  the  review  outcome  than  longer-term  impacts,  but  do  not 
have  quite  the  importance  of  team  quality,  research  approach,  or 
the  research  merit.  A  minimal  set  of  review  criteria  should 
include  team  quality,  research  merit,  research  approach, 
productivity,  and  mission  relevance. 

The  best  features  of  different  organizations'  peer  review 
practices  can  be  combined  into  a  heuristic  protocol  for  the  conduct 
of.  successful  peer  review  research  evaluations  and  impact 
assessments.  The  main  aims  of  the  protocol  are  to  insure  that  the 
final  assessment  product  has  the  highest  intrinsic  quality  and  that 
the  assessment  process  and  product  are  perceived  as  having  the 
highest  possible  credibility.  The  protocol  elements  are: 

PEER  REVIEW  RESEARCH  EVALUATIONS 

1.  The  objectives  of  the  assessment  must  be  stated  clearly  and 
unambiguously  at  the  initiation  of  the  assessment  by  the  highest 
levels  of  management,  and  the  full  support  of  top  management  must 
be  given  to  the  assessment.  In  turn,  the  objectives,  importance, 
and  urgency  of  the  assessment  must  be  articulated  and  communicated 
down  the  management  hierarchy  to  the  managers  and  performers  whose 
research  is  to  be  assessed,  and  the  cooperation  of  these  reviewees 
must  be  enlisted  at  the  earliest  stages  of  the  assessment; 

2.  The  final  assessment  product,  the  audience  for  the  product, 
and  the  use  to  be  made  of  the  product  by  the  audience  should  be 
considered  carefully  in  the  design  of  the  assessment; 

3 .  One  person  should  be  assigned  to  manage  the  assessment  at 
the  earliest  stage,  and  this  person  should  be  given  full  authority 
and  responsibility  for  the  assessment; 

4 .  The  assessment  manager  should  report  to  the  highest 
organizational  level  possible  in  order  to  insure  maximum 
independence  from  the  research  units  being  assessed; 

5 .  The  reviewers  should  be  selected  to  represent  a  wide 
variety  of  viewpoints,  in  order  to  address  the  many  different 
facets  of  research  and  its  impact  [Kostoff,  1988].  These  would 
include  bench-level  researchers  to  address  the  impact  of  the 
proposed  research  on  the  field  itself;  broad  research  managers  to 
address  potential  impact  on  allied  research  fields;  technologists 
to  address  potential  impact  on  technology  and  the  potential  of  the 
research  to  transition  to  higher  levels  of  development;  systems 
specialists  to  address  potential  impact  on  systems  and  hardware; 
and  operational  personnel  to  address  the  potential  impact  on 
downstream  organizational  operations.  The  reviewers  should  be 
independent  of  the  research  units  being  evaluated,  and  independent 
of  the  assessing  organization  where  possible.  The  objectives  of. 
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and  constraints  on  (if  any) ,  the  assessment  should  be  communicated 
to  the  reviewers  at  the  initial  contact; 

6.  Maximum  background  material  describing  the  resiearch  to  be 
assessed,  related  research  and  technology  development  sponsored  by 
external  organizations,  the  organization  structure,  and  other 
factors  pertinent  to  the  assessment,  should  be  provided  to  the 
reviewers  as  early  as  possible  before  the  review.  This  will  allow 
the  reviewers  and  presenters  to  use  their  time  most  productively 
during  the  review; 

7 .  Recommendations  resulting  from  the  assessment  should  be 
tracked  to  insure  that  they  are  considered  and  implemented,  where 
appropriate.  For  research  programs,  planning,  execution,  and 
review  are  linked  intimately.  Feedback  from  the  review  outcomes  to 
planning  for  the  next  cycle  should  be  tracked  to  insure  that  the 
reyiew/planning  coupling  is  operable. 

LEVELS  OF  ORGANIZATIONAL  RESEARCH  EVALUATION 

1.  Evaluations  should  be  performed  at  three  levels  of 
resolution  in  the  organization. 

la.  The  highest  level  would  be  an  annual  corporate  level 
review  of  how  the  organization  performs  research.  If  the 
organization  has  a  separate  research  unit,  then  the  unit  should  be 

!  evaluated  as  an  integrated  whole.  If  research  is  vertically 
integrated  with  development,  then  the  research  should  preferably  be 
evaluated  as  part  of  a  total  organization  R&D  review.  The  charter 
of  this  highest  level  assessment  would  be  to  review,  at  the 
corporate  level,  general  policy,  organization,  budget,  and  programs 
(e.g.,  NIST,  [1991]).  Total  inputs  and  outputs,  including 
integrated  bibliometric  indicators,  would  be  examined.  Overall 
research  management  processes  would  be  examined,  such  as  selection, 
execution,  review,  and  technology  transfer  of  research.  The 
overall  investment  strategy  would  be  evaluated,  and  would  include 
different  perspectives  of  the  program,  such  as  technical 
discipline,  performer,  and  end  use  allocation.  The  integration  of 
the  research  objectives  with  the  larger  organization  objectives 
would  be  assessed.  The  evaluators  would  include,  but  not  be 
limited  to,  representatives  of  the  stakeholder,  customer,  and  user 
community  whose  potential  conflicts  with  the  organization  are 
minimal . 

lb.  The  second  level  would  be  trienniel  peer  review  of  a 
discipline  or  management  unit  at  the  program  level  (e.g.,  kostoff, 
[1988,  1994b]),  where  a  program  is  defined  as  an  aggregation  of 
work  units  (Principal  Investigators) .  If  the  organization  has  a 
separate  research  unit,  then  the  discipline  should  be  evaluated  as 
an  integrated  whole.  In  the  nominal  review,  quality  and  relevance 
could  be  evaluated  concurrently.  If  research  is  vertically 
integrated  with  development,  then  the  research  should  preferably  be 
evaluated  as  part  of  a  total  vertical  structure  R&D  review.  In  the 
nominal  vertical  structure  review,  quality  and  relevance  should 
preferably  be  evaluated  separately.  Thus,  research  evaluation  must 
take  into  account  how  research  is  structured,  integrated,  and 
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managed  within  an  organization.  Research  quality  criteria  should 
include  research  merit,  research  approach,  productivity,  and  team 
quality.  Relevance  criteria  should  include  short  term  impact 
(transitions  and/or  utility) ,  long  term  potential  impact,  and  some 
estimate  of  the  probability  of  success  of  attaining  each  type  of 
impact.  While  the  emphasis  is  on  peer  review,  bibliometric  and 
other  type  of  indicators  should  be  utilized  to  supplement  the  peer 
evaluation. 

lc.  The  third  level  would  be  a  minimum  of  triennial  peer 
review  at  the  work  unit  (Principal  Investigator)  level  (e.g.,  DOE, 
[1993]).  Most  of  the  program  level  issues  described  above  are 
applicable  and  need  not  be  repeated  here. 

ld.  For  each  of  these  three  levels  of  review,  the  following 
criteria  and  issues  should  be  considered  during  the  review  as 
appropriate. 

le.  CRITERIA  TO  BE  CONSIDERED 

lei.  Quality  and  uniqueness  of  the  work 

leii.  Scientific  and  technological  opportunities  in  areas  of 
likely  organization  mission  importance 

leiii.  Need  to  establish  a  balance  between  revolutionary  and 
evolutionary  work 

leiv.  Position  of  the  work  relative  to  the  forefront  of  other 
efforts 

lev.  Responsiveness  to  present  and  future  organization  mission 
requirements 

levi.  Possibilities  of  follow-on  programs  in  higher  R&D 
categories 

levii.  Appropriateness  of  the  efforts  for  organization  vice 
other  organizations 

leviii.  Other  organization  connection  (  coordination)  of  the 

work 


If.  QUESTIONS  TO  BE  ASKED  OF  ORGANIZATION  PROGRAMS 

Ifi.  What  is  the  investment  strategy  of  the  larger  management 
unit.  This  would  include  the  relative  program  priorities,  the 
actual  investment  allocation  to  the  different  programs,  and  the 
rationale  for  the  investment  allocation.  For  each  program  being 
reviewed,  what  is  the  investment  strategy  for  its  thrust  areas. 

Ifii.  What  are  we  trying  to  do  (in  a  systems  concept)? 

Ifiii.  Can  specific  advantage  to  the  organization  be 
identified  if  program  is  successful? 

Ifiv.  How  is  the  system  done  today  and  what  are  the 
limitations  of  the  current  practice? 

Ifv.  Would  the  work  be  supported  if  it  were  not  already 
underway? 

Ifvi.  Assuming  success,  what  difference  does  it  make  to  the 
user  in  a  mission  area  content? 

Ifvii.  What  is  the  technical  content  of  the  program  and  how 
does  it  fit  with  other  ongoing  efforts  in  academia,  industry. 
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organization  labs,  other  labs,  etc.? 

Ifviii.  What  are  the  decision  milestones  of  the  program? 

Ifix.  How  long  will  the  program  take;  how  much  will  the 
program  cost;  what  are  the  mid-term  and  final  objectives  of  the 
program? 

In  Europe,  another  development  line  has  been  to  commission 
evaluation  experts  either  to  support  panels  or  to  conduct 
independent  assessments  which  may  involve  surveys,  in-depth 
interviews,  case  studies,  etc  [Ormala,  1994].  Barker  [1992] 
describes  how  evaluation  experts  coming  from  two  main  communities 
(civil  servants  and  academic  policy  researchers)  interact  in 
evaluation  of  R&D  in  the  UK.  The  performance  of  evaluations, 
including  the  synthesis  of  evidence  and  the  production  of 
conclusions  and  recommendations,  is  done  by  professionals,  as 
opposed  to  panels  of  eminent  persons. 

Problems  with  Peer  Review 

Peer  review  problems  include  [Roy,  1985;  King,  1987; 
Kruytbosch,  1989;  Chubin,  1990,  1994;]: 

1.  Partiality  of  peers  to  impact  the  outcome  for  non-technical 
reasons ; 

2.  an  'Old  Boy'  network  to  protect  established  fields; 

3.  a  'Halo'  effect  for  higher  likelihood  of  funding  for  more 
visible  scientists/  departments/  institutions; 

4.  reviewers  differ  in  criteria  to  assess  and  interpret; 

5.  the  peer  review  process  assumes  agreement  about  what  good 
research  is,  and  what  are  promising  opportunities. 

These  potential  problems  should  be  considered  during  the 
process  of  selecting  research  impact  assessment  approaches. 

Another  problem  with  peer  review  is  cost.  The  true  total 
costs  of  peer  review  can  be  considerable  but  tend  to  be  ignored  or 
understated  in  most  reported  cases.  For  serious  panel-type  peer 
reviews,  where  sufficient  expertise  is  represented  on  the  panels, 
total  real  costs  will  dominate  direct  costs  by  as  much  as  an  order 
of  magnitude  or  more  [Kostoff,  1994e] .  The  major  contributor  to 
total  costs  for  either  type  of  review  is  the  time  of  all  the 
players  involved  in  executing  the  review.  With  high  quality 
performers  and  reviewers,  time  costs  are  high,  and  the  total  review 
costs  can  be  a  non-negligible  fraction  of  total  program  costs, 
especially  for  programs  that  are  people  intensive  rather  than 
hardware  intensive. 

The  issue  of  peer  review  predictability  affects  the 
credibility  of  technological  forecasting  directly.  A  few  studies 
have  been  done  relating  reviewers'  scores  on  component  evaluation 
criteria  to  proposal  or  project  review  outcomes.  Some  studies  have 
been  done  in  which  reviewers'  ratings  of  research  papers  have  been 
compared  to  the  numbers  of  citations  received  by  these  papers  over 
time  [Bornstein,  1991a,  1991b].  Correlations  between  reviewers' 
estimates  of  manuscript  quality  and  impact  and  the  number  of 
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citations  received  by  the  paper  over  time  were  relatively  low.  The 
author  is  not  aware  of  reported  studies,  singly  or  in  tandem,  that 
have  related  peer  review  scores/rankings  of  proposals  to  downstream 
impacts  of  the  research  on  technology,  systems,  and  operations, 
although  some  initial  efforts  have  been  initiated  [Van  den  Beemt, 
1991]  •  This  type  of  study  would  require  an  elaborate  data  tracking 
system  over  lengthy  time  periods  which  does  not  exist  today.  Thus, 
the  value  of  peer  review  as  a  predictive  tool  for  assessing  the 
impact  of  research  on  an  organization's  mission  (other  than 
research  for  its  own  sake)  rests  on  faith  more  than  on  hard 
documented  evidence. 

PEER  REVIEW  CONCLUSIONS 

Peer  review  is  the  most  widely  used  and  generally  credible 
method  used  to  assess  the  impact  of  research.  Much  of  the 
criticism  of  peer  review  has  arisen  from  misunderstandings  of  its 
accuracy  resolution  as  a  measuring  instriament.  While  a  peer  review 
can  gain  consensus  on  the  projects  and  proposals  that  are  either 
outstanding  or  poor,  there  will  be  differences  of  opinion  on  the 
projects  and  proposals  that  cover  the  much  wider  middle  range.  For 
projects  or  proposals  in  this  middle  range,  their  fate  is  somewhat 
more  sensitive  to  the  reviewers  selected.  If  a  key  purpose  of  a 
peer  review  is  to  insure  that  the  outstanding  projects  and 
proposals  are  funded  or  continued,  and  the  poor  projects  are  either 
terminated  or  modified  strongly,  then  the  capabilities  of  the  peer 
review  instrument  are  well  matched  to  its  requirements. 

However,  the  value  of  peer  review  as  a  predictive  tool  for 
assessing  the  impact  of  research  on  an  organization's  mission 
(other  than  research  for  its  own  sake)  rests  on  faith  more  than  on 
hard  documented  evidence.  Also,  for  serious  panel-type  peer 
reviews  or  mail-type  peer  reviews,  where  sufficient  expertise. is 
represented  on  the  panels,  total  real  costs  will  dominate  direct 
costs.  The  major  contributor  to  total  costs  is  the  time  of  all  the 
players  involved  in  executing  the  review.  With  high  quality 
performers  and  reviewers,  time  costs  are  high,  and  the  total  review 
costs  can  be  a  non-negligible  fraction  of  total  program  costs, 
especially  for  programs  that  are  people  intensive  rather  than 
hardware  intensive. 

Most  methods  used  in  practice  include  criteria  which  address 
the  impact  of  research  on  its  own  and  allied  fields,  as  well  as  on 
the  mission  of  the  sponsoring  organization.  Nearer-term  .research 
impacts  typically  play  a  more  important  role  in  the  review  outcome 
than  longer-term  impacts,  but  do  not  have  quite  the  importance  of 
team  quality,  research  approach,  or  the  research  merit.  A  minimal 
set  of  review  criteria  should  include  team  quality,  research  merit, 
research  approach,  research  productivity,  and  a  criterion  related 
to  longer-term  relevance  to  the  organization's  mission.  More 
important  than  the  criteria  is  the  dedication  of  an  organization's 
management  to  the  highest  quality  objective  review,  and  the 
associated  emplacement  of  rewards  and  incentives  to  encourage 
quality  reviews. 
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SEMI-QUANTITATIVE  METHODS  SUMMARY 

In  the  evaluation  of  research  performance  and-  impact,  a 
spectrum  of  approaches  may  be  considered.  At  one  end  of  the 
spectrum  are  the  subjective,  essentially  non-quantitative 
approaches,  of  which  peer  review  is  the  prototype  [Chubin,  1994]. 
At  the  other  end  of  the  spectrum  are  the  mainly  quantitative 
approaches,  such  as  evaluative  bibliometrics  and  cost-benefit 
[Narin,  1994;  Mansfield,  1991].  In  between  are  retrospective  or 
case  study  approaches  [Kostoff,  1993d;  Kingsley,  1993]. 

These  retrospective  methods  make  little  use  of  mathematical 
tools,  but  draw  on  documented  approaches  and  results  wherever 
possible.  In  practice,  there  are  two  major  reasons  that  research 
sponsoring  organizations  perform  retrospective  studies  of  research. 
Positive  research  impact  on  the  organization's  mission  provides 
evidence  to  the  stakeholders  that  there  is  benefit  in  continuing 
sponsorship  of  research.  Also,  if  the  study  is  sufficiently 
comprehensive,  the  environmental  parameters  which  helped  the 
research  succeed  can  be  identified,  and  these  lessons  can  be  used 
to  improve  future  research. 

There  are  two  major  variants  of  retrospective  studies.  One 
type  starts  with  a  successful  technology  or  system  and  works 
backwards  to  identify  the  critical  R&D  events  which  led  to  the  end 
product.  The  other  type  starts  with  initial  research  grants  and 
traces  evolution  forward  to  identify  impacts.  The  tracing 
backwards  approach  is  favored  for  two  reasons:  1)  the  data  is 
easier  to  obtain,  since  forward  tracking  is  essentially  non¬ 
existent  for  evolving  research;  and  2)  the  sponsors  have  little 
interest  in  examining  research  that  may  have  gone  nowhere. 

While  methods  for  performing  retrospective  and  case  studies 
may  differ  within  and  across  industry  and  government  [Kingsley, 
1993],  especially  concerning  the  research  question,  case  selection, 
and  analytic  framework,  the  fundamental  evaluation  problems 
encountered  are  pervasive  across  these  different  methods.  Now,  a 
few  of  the  more  widely  known  case  studies  will  be  reviewed,  and  the 
key  pervasive  problems  and  findings  will  be  identified.  These 
retrospective  studies  include  Project  Hindsight,  Project  TRACES 
and  its  follow-on  studies,  and  Accomplishments  of  Department  of 
Energy  (DOE)  Office  of  Health  and  Environmental  Research  (OHER)  and 
of  the  Advanced  Research  Projects  Agency  (ARPA) . 

SPECIFIC  RETROSPECTIVE  STUDIES 

Project  Hindsight 

Project  Hindsight  was  a  retrospective  study  performed  by  the 
Defense  Department  in  the  mid-1960s  to  identify  those  management 
factors  important  in  assuring  that  research  and  technology  programs 
are  productive  and  that  program  results  are  used  [DOD,  1969].  The 
evolution  of  the  new  technology  represented  in  each  of  the  20 
weapons  systems  selected  was  traced  back  in  post-WW2  time  to 
critical  points  called  "Research  or  Exploratory  Development  (RXD) 
Events " . 
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Original  TRACES  Study 

In  1967,  The  National  Science  Foundation  (NSF)  instituted  a 
study  [IITRI,  1968]  to  trace  retrospectively  key  events  which  had 
led  to  a  number  of  major  technological  innovations.  One  goal  was 
to  provide  more  specific  information  on  the  role  of  the  various 
mechanisms,  institutions,  and  types  of  R&D  activity  required  for 
successful  technological  innovation.  Similar  to  Project  Hindsight, 
key  'events'  in  the  R&D  history  of  each  innovation  selected  were 
identified,  and  their  characteristics  were  examined. 

Follow-On  TRACES  Study 

In  a  follow-on  study  to  TRACES,  the  NSF  sponsored  Battelle- 
Columbus  Laboratories  to  perform  a  case  study  examination  of  the 
process  and  mechanism  of  technological  innovation  [Battelle,  1973]. 
For  each  of  the  ten  innovations  studied,  the  significant  events 
(important  activity  in  the  history  of  an  innovation)  and  decisive 
events  (a  significant  event  which  provides  a  major  and  essential 
impetus  to  the  innovation)  which  contributed  to  the  innovation  were 
identified.  The  influence  of  various  exogenous  factors  on  the 
decisive  events  was  determined,  and  several  important 
characteristics  of  the  innovative  process  as  a  whole  were  obtained. 
The  following  important  exogenous  factors  for  producing  significant 
innovations  were  identified: 

1.  The  technical  entrepreneur  (a  major  driving  force  in  the 
innovative  process) ; 

2.  Early  recognition  of  the  need; 

3.  Government  funding  (more  generally,  availability  of 
financial  support,  from  whatever  source) ; 

4 .  The  occurrence  of  an  unplanned  confluence  of  technology 
(confluence  of  technology  occurred  for  some  innovations  as  a  result 
of  deliberate  planning,  rather  than  by  accident) ; 

5.  Most  of  the  innovations  originated  outside  the  organization 
that  developed  them; 

6.  Additional  supporting  inventions  were  required  during  the 
development  effort  for  all  the  innovations  studied  to  arrive  at  a 
product  with  consumer  acceptance. 

While  the  technical  entrepreneur  is  viewed  as  extremely 
important  to  the  innovative  process,  it  does  not  appear  (to  the 
author)  to  be  the  critical  path  factor.  Exeuaination  of  the 
historiographic  tracings  which  display  the  significant  events 
chronologically  for  each  of  the  innovations  shows  that  an  advanced 
pool  of  knowledge  must  be  developed  in  many  fields  before  synthesis 
leading  to  an  innovation  can  occur.  The  entrepreneur  can  be  viewed 
as  an  individual  or  group  with  the  ability  to  assimilate  this 
diverse  information  and  exploit  it  for  further  development. 
However,  once  this  pool  of  knowledge  exists,  there  are  many  persons 
or  groups  with  capability  to  exploit  the  information,  and  thus  the 
real  critical  path  to  the  innovation  is  more  likely  the  knowledge 
pool  than  any  particular  entrepreneur.  The  entrepreneurs  listed  in 
the  study  undoubtedly  accelerated  the  introduction  of  the 
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innovation,  but  they  were  at  all  times  paced  by  the  developmental 
level  of  the  knowledge  pool. 

Recent  TRACES  Study 

In  a  modern  version  of  the  TRACES  study,  the  National  Cancer 
Institute  initiated  an  assessment  [Narin,  1989]  to  determine 
whether  there  were  certain  research  settings  or  support  mechanisms 
which  were  more  effective  in  bringing  about  important  advances  in 
cancer  research.  The  approach  taken  was  analogous  in  concept  to 
the  initial  TRACES  study,  with  the  addition  of  citation  analyses  to 
provide  an  independent  measure  of  the  impact  of  the  Trace  papers 
(papers  associated  with  each  key  'event*),  and  by  adding  control 
sets  of  papers. 

DARPA  Accomplishments  Study 

The  Institute  for  Defense  Analysis  produced  a  document  [IDA, 
1991]  describing  the  accomplishments  of  the  Defense  Advanced 
Research  Projects  Agency  (DARPA-now  renamed  ARPA)  .  Of  the  hundreds 
of  projects  and  programs  funded  by  DARPA  over  its  then  (1988)  30 
year  lifetime,  49  were  selected  and  studied  in  detail,  and 
conditions  for  success  were  identified. 

The  qualities  of  DARPA- supported  programs  and  projects  that 
contributed  to  success  can  be  summarized; 

1.  A  need  existed  for  what  the  output  could  do; 

2.  There  was  a  strong  commitment  by  individuals  to  a  concept; 

3 .  Bright  and  imaginative  individuals  were  given  the 
opportunity  to  pursue  ideas  with  minimal  bureaucratic  encumbrance; 

4.  There  was  an  ongoing  stream  of  technical  developments  and 
evolution; 

5.  DARPA  management  gave  strong,  top-level  management  support; 

6.  There  was  explicit  effort,  taken  early,  to  improve 
acceptance  by  the  user  community. 

DOE  OHER  Accomplishments  Book 

The  approach  taken  by  DOE  was  to  describe  the  40-year  history 
of  OHER  [DOE,  1983,  1986],  and  present  selected  accomplishments  in 
different  research  areas  from  different  points  in  time.  This 
technique  allowed  impacts  and  benefits  of  the  research  to  be 
tracked  through  time,  and  in  some  cases  to  be  quantified  as  well. 

PRINCIPLES  OF  HIGH  QUALITY  RETROSPECTIVE  STUDIES 

A  careful  reading  of  the  above,  and  many  other,  retrospective 
studies  shows  that  it  is  difficult  to  assess  study  quality  on  the 
sole  basis  of  the  published  report.  However,  principles  for  high 
quality  retrospective  studies  have  been  generated  by  the  author  by 
integrating  the  contents  of  these  reports  with  personal  experience 
in  conducting  retrospective  studies.  A  high  quality  retrospective 
study  is  an  accurate  reflection  of  the  evolution  and  relation  of 
all  critical  sciences  and  technologies  which  resulted  in  the 
technology  of  present  interest.  Thus,  a  high  quality  retrospective 
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study  is  analogous  to  a  high  resolution  picture  of  the  evolving 
relationships  among  science  and  technology  areas  related  critically 
to  the  focal  technology,  and  incorporates  especially  the  concepts 
of  awareness,  coordination,  and  completeness.  More  specific 
requirements,  or  underlying  principles,  necessary  for  a  high 
quality  retrospective  study  can  be  formulated  as  follows. 

The  most  important  factor  is  the  commitment  of  the 
retrospective  study  organization's  senior  management  to 
high-quality  retrospective  studies,  and  the  associated  emplacement 
of  rewards  and  incentives  to  encourage  such  retrospective  studies. 

The  second  most  important  factor  is  the  retrospective  study 
manager's  motivation  to  construct  a  technically  credible  and 
visionary  retrospective  study.  The  retrospective  study  manager 
sets  the  boundary  conditions  and  constraints  on  the  retrospective 
study  scope,  structures  the  working  groups,  and  selects  the  final 
retrospective  study  elements  from  a  myriad  of  inputs.  In  some 
organizations,  the  retrospective  study  manager  has 
the  latitude  to  select  the  complete  retrospective  study  process  and 
criteria,  and  in  all  organizations  presently  has  the  latitude  to 
select  the  retrospective  study  contributing  technical  experts  by  a 
non-random  process.  If  the  retrospective  study  manager  does  not 
follow,  either  consciously  or  subconsciously,  the  highest  standards 
in  selecting  these  experts,  the  retrospective  study's  final  form 
could  be  substantially  determined  even  before  the  study  process 
begins. 

The  third  most  important  factor  consists  of  the  study  experts' 
competence  and  objectivity.  Each  expert  should  be  technically 
competent  in  his  subject  area,  and  the  competence  of  the  total 
retrospective  study  development  team  should  cover  the  multiple 
research  and  technology  areas  critically  related  to  the  science  or 
technology  area  of  present  interest.  In  addition,  the  team's  focus 
should  not  be  limited  to  disciplines  related  only  to  the  focal 
technology  area  (which  tends  to  reinforce  the  status  quo  and 
further  promulgate  development  along  very  narrow  lines)  ,  but  should 
be  broadened  to  disciplines  and  technologies  well  beyond  the  focal 
technology. 

For  retrospective  studies  which  will  be  used  as  a  basis  for 
comparison  of  science  and  technology  programs  or  projects,  the 
fourth  most  important  factor  is  normalization  and  standardization 
across  different  retrospective  studies,  study  component  teams,  and 
science  and  technology  areas.  For  science  and  technology  areas 
which  have  some  similarity,  use  of  common  experts  (on  the  study 
teams)  with  broad  backgrounds  which  overlap  the  disciplines  can 
provide  some  degree  of  standardization.  For  very  disparate  science 
and  technology  areas,  some  allowances  need  to  be  made  for  the 
relative  strategic  value  of  each  discipline  to  the  organization, 
and  arbitrary  corrections  applied  for  benefit  estimation 
differences  and  biases.  Even  in  this  case  of  disparate 
disciplines,  some  normalization  is  possible  by  having  some  common 
team  members  with  broad  backgrounds  contributing  to  the 
retrospective  studies  for  diverse  programs  and  projects. 

The  fifth  most  important  factor  is  criteria  for  retrospective 
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study  component  selection.  Since  retrospective  studies  tend  to 
focus  on  the  critical  science  and  technology  events  which  led  to 
successful  technologies/  systems,  the  definition  of  criteria  for 
'successful'  and  'critical'  is  of  utmost  importance  for 
establishing  the  credibility  of  the  retrospective  study. 

A  factor  of  equal  importance  is  reliability  or  repeatibility . 
To  what  degree  would  a  retrospective  study  be  replicated  if  a 
completely  different  team  were  involved  in  its  construction?  If 
each  team  were  to  construct  a  completely  different  retrospective 
study  for  the  same  topic,  then  what  meaning  or  credibility  or  value 
can  be  assigned  to  any  retrospective  study?  To  minimize 
repeatibility  problems,  a  reasonably  sizeable  segment  of  the 
competent  technical  community  should  be  involved  in  the 
construction  and  review  of  the  the  retrospective  study.  For 
government-constructed  retrospective  studies,  this  does  not  present 
a  conceptual  problem,  although  it  might  present  a  logistics  problem 
for  sufficiently  large  community  involvement.  For  industry- 
constructed  retrospective  studies,  where  proprietary  problems  could 
arise  if  the  external  community  becomes  involved,  the  participation 
may  have  to  be  limited.  The  recommendation  should  be  re¬ 
interpreted  as  'to  the  degree  possible  within  organizational 
constraints ' . 

A  sixth  critical  factor  for  quality  retrospective  studies  is 
cost.  The  true  total  costs  of  developing  a  high  quality 
retrospective  study  with  substantial  community  input  can  be 
considerable,  but  tend  to  be  understated.  For  high  quality 
retrospective  studies,  where  sufficient  expertise  is  represented  on 
the  study  team,  the  major  contributor  to  total  costs  is  the  time  of 
all  the  individuals  involved  in  developing  and  reviewing  the 
retrospective  study.  With  high  quality  personnel  involved  in  the 
development  and  review  process,  time  costs  are  high,  and  the  total 
study  costs  can  be  non-negligible.  Costs  should  not  be  neglected 
in  designing  a  high  quality  retrospective  study  development 
process. 

The  final  critical  factor,  and  perhaps  the  foundational 
factor,  in  high  quality  retrospective  study  development  is  the 
maintenance  of  high  ethical  standards  throughout  the  process. 
There  is  a  plethora  of  potential  ethical  issues,  because  there  is 
an  inherent  bias/  conflict  of  interest  in  the  process  when  real 
experts  are  desired  as  retrospective  study  performers  and 
reviewers.  The  retrospective  study  development  managers  need  to  be 
vigilant  for  undue  signs  of  distortion  aimed  at  personal  gain. 

SEMI-QUANTITATIVE  METHODS  CONCLUSIONS 

Hindsight,  TRACES,  and,  to  some  degree,  the  OHER  and  DARPA 
accomplishments  books  had  some  similar  themes.  All  these  methods 
used  a  historiographic  approach,  looked  for  significant  research  or 
development  events  in  the  metamorphosis  of  research  programs  in 
their  evolution  to  products,  and  attempted  to  convince  the  reader 
that:  (1)  the  significant  research  and  exploratory  development 
events  in  the  development  of  the  product  or  process  were  the  ones 
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identified;  (2)  typically,  the  organization  sponsoring  the  study 
was  responsible  for  some  of  the  (critical)  significant  events;  (3) 
the  final  product  or  process  to  which  these  events  contributed  was 
important;  and  (4)  while  the  costs  of  the  research  and  development 
were  not  quantified,  and  the  benefits  (typically)  were  not 
quantified,  the  research  and  development  were  worth  the  cost. 

Six  critical  conditions  for  innovation  were  identified  through 
analysis  of  these  retrospective  studies.  The  most  important 
condition  appears  to  be  the  existence  of  a  broad  pool  of  knowledge 
which  minimizes  critical  path  obstacles  and  can  be  exploited  for 
development  purposes.  This  condition  is  followed  in  importance  by 
a  technical  entreprenuer  who  sees  the  technical  opportunity  and 
recognizes  the  need  for  innovation,  and  who  is  willing  to  champion 
the  concept  for  long  time  periods,  if  necessary.  Also  valuable  are 
strong  financial  and  management  support  coupled  with  many 
continuing  inventions  in  different  areas  to  support  the  innovation. 

As  the  historiographic  analyses  (Hindsight/  TRACES)  of  a 
technology  or  system  have  shown,  if  the  time  interval  in  which  the 
antecedent  critical  events  occur  is  arbitrarily  truncated,  as  in 
the  two-decade  time  interval  Hindsight  case,  the  impacts  of  basic 
research  on  the  technology  or  system  will  not  be  given  adequate 
recognition.  The  number  of  mission  oriented  research  events  peaks 
about  a  decade  before  the  technology  innovation.  However,  the 
number  of  non-mission  oriented  research  events  peaks  about  three 
decades  before  the  technology  innovation,  and  eight,  nine,  or  more 
decades  may  be  necessary  in  some  cases  to  recognize  the  original 
critical  antecedent  events.  Over  a  long  time  interval,  the 
majority  of  key  R&D  events  tend  to  be  non-mission  oriented.  Thus, 
future  studies  of  this  type  should  allow  time  intervals  of  many 
decades  to  insure  that  critical  non-mission  oriented  research 
events  are  captured. 

Even  in  those  cases  when  an  adequate  time  interval  was  used, 
and  critical  non-mission  oriented  events  were  identified,  the 
cumulative  indirect  impacts  of  basic  research  were  not  accounted 
for  by  any  of  the  retrospective  approaches  published  or  in  use 
today.  A  recent  study  [Kostoff,  1994i]  which  examined  impacts  of 
research  on  other  research  and  technology  through  direct  and 
indirect  paths  using  a  network  approach  showed  that  the  indirect 
impacts  of  fundamental  research  can  be  very  large  in  a  cumulative 
sense.  Future  retrospective  studies  would  be  more  credible  if  they 
devote  more  effort  to  identifying  indirect  impacts  of  research. 
While  indirect  impacts  of  research  are  much  more  difficult  to 
identify  than  direct  impacts,  and  the  data  gathering  effort  is  much 
larger  and  more  complex,  neglect  of  indirect  impacts  reduces 
appreciation  of  the  value  of  basic  research  significantly.  Use  of 
some  of  the  advanced  computer-based  technologies  available  today, 
such  as  the  network  approach  referenced  above  or  citation  analysis 
[Narin,  1989],  could  identify  many  of  the  pathways  of  the  indirect 
impacts  of  research. 

A  detailed  reading  of  those  studies  which  attempted  to 
incorporate  economic  quantification  showed  the  difficulties  of 
trying  to  identify,  assign,  and  quantify  costs  and  benefits  of 
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basic  research,  especially  at  a  project/ investigator  level.  As 
TRACES  and  other  similar  studies  have  shown,  the  chain  of  events 
leading  to  an  innovation  is  long  and  broad.  Many  researchers  over 
many  years  have  been  involved  in  the  chain,  and  many  funding 
agencies,  some  simultaneously  with  the  same  researchers,  may  have 
been  involved.  The  allocation  of  costs  and  benefits  under  such 
circumstances  is  a  very  difficult  and  highly  arbitrary  process. 
The  allocation  problem  is  reduced,  but  not  eliminated,  when  the 
analysis  is  applied  at  the  macro  level  (integrating  across 
individual  researchers,  organizations,  etc.). 

One  goal  of  all  the  studies  presented  was  to  identify  the 
products  of  research  and  some  of  their  impacts.  The  Hindsight, 
TRACES,  and  ARPA  studies  tried  to  identify  factors  which  influenced 
the  productivity  and  impact  of  research.  The  following  conclusions 
about  the  role  and  impact  of  basic  research  were  reached; 

1.  The  majority  of  basic  research  events  which  directly 
impacted  technologies  or  systems  were  non-mission  oriented  and 
occurred  many  decades  before  the  technology  or  system  emerged; 

2 .  The  cumulative  indirect  impacts  of  basic  research  were  not 
accounted  for  by  any  of  the  retrospective  approaches  pxiblished; 

3 .  An  advanced  pool  of  knowledge  must  be  developed  in  many 
fields  before  synthesis  leading  to  an  innovation  can  occur; 

4.  Allocation  of  benefits  among  researchers,  organizations, 
and  funding  agencies  to  determine  economic  returns  from  basic 
research  is  very  difficult  and  arbitrary,  especially  at  the  micro 
level . 

While  these  approaches  do  provide  interesting  information  and 
insight  into  the  transition  process  from  research  to  development  to 
products,  processes,  or  systems,  the  arbitrary  selectivity  and 
anecdotal  nature  of  many  of  the  results  render  any  conclusions  as 
to  cost-effectiveness  or  generalizability  suspect.  Supplementary 
analyses  using  other  approaches  are  required  for  further 
justification  of  the  value  of  the  R&D. 

QUANTITATIVE  METHODS  SUMMARY 

Quantitative  approaches  to  research  assessment  focus  on  the 
numerics  associated  with  the  performance  and  outcomes  of  research. 
The  main  approaches  used  are  bibliometrics  and  econometrics  such  as 
cost-benefit  and  production  function  analysis.  This  summary 
focuses  on  these  three  main  approaches,  briefly  describes  the 
bibliometrics-related  family  of  approaches  known  as  co-occurrence 
phenomena,  briefly  describes  a  network  modeling  approach  to 
quantifying  research  impacts,  and  ends  with  an  expert  systems 
approach  for  supporting  research  assessment. 

BIBLIOMETRICS 

Bibliometrics,  especially  evaluative  bibliometrics,  uses 
counts  of  publications,  patents,  citations  and  other  potentially 
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informative  items  to  develop  science  and  technology  performance 
indicators.  The  choice  of  important  bibliometric  indicators  to  use 
for  research  performance  measurement  may  not  be  straightforward. 
A  1993  study  surveyed  about  4,000  researchers  to  identify 
appropriate  bibliometric  indicators  for  their  particular 
disciplines  [Australia,  1993].  The  respondents  were  grouped  in 
major  discipline  categories  across  a  broad  spectrum  of  research 
areas.  While  the  major  discipline  categories  agreed  on  the 
importance  of  publications  in  refereed  journals  as  a  performance 
indicator,  there  was  not  agreement  about  the  relative  values  of  the 
remaining  19  indicators  provided  to  the  respondents.  For  the 
respondents  in  total,  the  important  performance  indicators  were: 

1.  Publications  (publication  of  research  results  in  refereed 
journals) ; 

2.  Peer  Reviewed  Books  (research  results  published  as 
commercial  books  reviewed  by  peers) ; 

3.  Keynote  Addresses  (invitations  to  deliver  keynote 
addresses,  or  present  refereed  papers  and  other  refereed 
presentations  at  major  conferences  related  to  one's  profession); 

4.  Conference  Proceedings  (publication  of  research  results  in 
refereed  conference  proceedings) ; 

5.  Citation  Impact  (publication  of  research  results  in 
journals  weighted  by  citation  impact) ; 

6.  Chapters  in  Books  (research  results  published  as  chapters 
in  commercial  books  reviewed  by  peers) ; 

7.  Competitive  Grants  (ability  to  attract  competitive,  peer 
reviewed  grants  from  the  ARC,  NH&MRC,  rural  R&D  corporations  and 
similar  government  agencies) . 

These  bibliometric  indicators  can  be  used  as  part  of  -  an 
analytical  process  to  measure  scientific  and  technological 
accomplishment.  Because  of  the  volume  of  documented  scientific  and 
technological  accomplishments  being  produced  (5,000  scientific 
papers  published  in  refereed  scientific  journals  every  working  day 
worldwide;  1,000  new  patent  documents  issued  every  working  day 
worldwide) ,  use  of  computerized  analyses  incorporating  quantitative 
indicators  is  necessary  to  understand  the  implications  of  this 
technical  output  [Narin,  1994]. 

Narin  states  three  axioms  that  underlie  the  utilization  and 
validity  of  bibliometric  analysis.  The  first  axiom  is  activity 
measurement:  that  counts  of  patents  and  papers  provide  valid 
indicators  of  R&D  activity  in  the  subject  areas  of  those  patents  or 
papers,  and  at  the  institution  from  which  they  originate.  The 
second  axiom  is  impact  measurement:  that  the  number  of  times  those 
patents  or  papers  are  cited  in  subsequent  patents  or  papers 
provides  valid  indicators  of  the  impact  or  importance  of  the  cited 
patents  and  papers.  However,  there  could  be  weightings  applied  to 
the  raw  count  data,  depending  on  the  perceived  importance  of  the 
journals  containing  the  citing  papers.  Also,  the  impacts  would  be 
on  allied  research  fields  or  technologies,  not  necessarily  long¬ 
term  impacts  on  the  originating  organization's  mission.  The  third 
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axiom  is  linkage  measurement:  that  the  citations  from  papers  to 
papers,  from  patents  to  patents  and  from  patents  to  papers  provide 
indicators  of  intellectual  linkages  between  the  organizations  which 
are  producing  the  patents  and  papers,  and  knowledge  linkage  between 
their  subject  areas  [Narin,  1994]. 

Use  of  bibliometrics  can  be  categorized  into  four  levels  of 
aggregation  [Narin,  1994]: 

1.  policy  (evaluation  of  national  or  regional  technical 
performance) ; 

2.  strategy  (evaluation  of  the  scientific  performance  of 
universities  or  the  technological  performance  of  companies) ; 

3.  tactics  (tracing  and  tracking  R&D  activity  in  specific 
j  scientific  and  technological  areas  or  problems) ; 

I  .4.  conventional  (identifying  specific  activities  and  specific 
I  people  engaged  in  research  and  development) . 

Policy  questions  deal  with  the  analysis  of  very  large  numbers 
of  papers  and  patents,  often  hundreds  of  thousands  at  a  time,  to 
characterize  the  scientific  and  technological  output  of  nations  and 
regions.  Strategic  analyses  tend  to  deal  with  thousands  to  tens  of 
thousands  of  papers  or  patents  at  a  time,  numbers  that  characterize 
the  publication  or  patent  output  of  universities  and  companies. 
Tactical  analyses  tend  to  deal  with  hundreds  to  thousands  of  papers 
or  patents,  and  deal  typically  with  activity  within  a  specific 
subject  area.  Finally,  conventional  information  retrieval  tends  to 
deal  with  identifying  individual  papers,  patents,  inventors  and 
clusters  of  interest  to  an  individual  scientist  or  engineer  or 
research  manager  working  on  a  specific  research  project. 

The  first,  and  major,  step  in  the  performance  of  a  high 
quality  bibliometric  analysis  in  any  of  the  above  four  levels- of 
aggregation  is  acceptance  by  the  potential  user  of  the  above  three 
axioms  to  validate  the  credibility  of  the  bibliometric  approach. 
Once  this  hurdle  has  been  passed,  the  second  step  is  to  select  the 
highest  quality  and  reliability  raw  indicator  products  (data  and 
databases)  and  apply  analyses  of  the  highest  statistical  precision 
and  accuracy  to  these  indicators  [Braun,  1989,  1990,  1993].  The 
third  step,  which  in  many  cases  will  determine  the  utility  of  the 
results,  is  the  interpretation  and  visual  display  of  the  results. 
The  results  of  the  most  stringent  analyses  will  be  relatively 
worthless  if  they  are  not  displayed  in  a  concise  and  lucid  form. 

Indicators  can  be  arranged  in  one  or  more  dimensions. 
Emphasis  has  always  been  laid  on  the  necessity  of  multidimensional 
thinking  while  analyzing  scientometric  indicators.  Scientific 
research  is  a  multifaceted  human  activity,  and  overemphasizing  any 
of  its  aspects  (publication  productivity,  citation  influence, 
technological  applicability,  etc.)  may  lead  to  serious  distortions 
in  its  assessment.  While  each  scientometric  indicator  represents 
a  single  component  of  a  multidimensional  manifold  which  itself  is 
just  one  element  in  assessing  a  complex  system,  presentations  in 
one  or  several  dimensions  may  equally  prove  useful  [Braun,  1993]. 

The  most  direct  way  of  presenting  scientometric  indicators  is 
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in  one  dimensional  ranked  lists.  While  simplistic,  this  approach 
reflects  the  paramount  competitiveness  of  the  scientific 
enterprise.  Linear  rankings  are  most  attractive  for  presentation 
to  the  larger  non-specialist  audience  (see  Braun  [1993]). 

Two  dimensional  displays  can  include  relational  charts  or 
scatter  plots  for  correlations.  In  two  dimensional  relational 
charts  [Schubert,  1986;  Braun,  1987],  pairs  of  indicators  (observed 
vs.  expected  citation  rates  or  attractivity  vs.  activity 
indices) are  displayed  in  a  planar  orthogonal  coordinate  system. 
Emphasis  is  shifted  from  ranking  to  the  formation  of  groups  or 
•clusters'  and  other  characteristic  relations  among  various 
indicators. 

An  obvious  deficiency  of  the  relational  charts  is  the  lack  of 
any  indication  of  the  size  of  the  sets  of  publications  underlying 
the  points  of  the  diagram.  By  adding  the  third  dimension  of 
publication  size,  this  objection  can  be  overcome.  The  basic  idea 
of  'landscaping'  national  scientific  performances  is  to  represent 
the  size  by  the  'mass'  of  a  mountain-like  formation.  If  two  or 
more  countries  have  similar  citation  characteristics,  the  peaks 
representing  them  may  get  superimposed  forming  chains,  massifs,  and 
other  surface  formations.  An  example  is  presented  in  Braun  [1991]. 

There  seems  to  be  a  natural  limit  of  graphical  presentation  at 
three  dimensions.  There  are  techniques,  however,  to  overcome  this 
apparent  restriction.  A  rather  original  method  of  representing 
multivariate  data  was  proposed  by  Herman  Chernoff;  "Each  point  in 
k-dimensional  space,  k<=18,  is  represented  by  a  cartoon  face  whose 
features,  such  as  length  of  nose  and  curvature  of  mouth  correspond 
components  of  the  point.  Thus  every  multivariate  observation  is 
visualized  as  a  computer  drawn  face.  This  presentation  makes  it 
easy  for  the  human  mind  to  grasp  many  of  the  essential  regularities 
and  irregularities  present  in  the  data." 

Braun  [1993]  shows  a  face  pattern  with  18  facial  features 
applicable  in  representing  multidimensional  data.  Schubert  [1992] 
contains  a  four-dimensional  example  of  applying  Chernof f-faces  in 
scientometrics:  uncitedness,  citation  rate  per  cited  paper,  mean 
expected  citation  rate  and  relative  citation  rate  are  represented 
by  the  shape  of  face,  size  of  eyes,  length  of  nose  and  curvature 
and  length  of  mouth,  respectively. 

Problems  with  publication  and  citation  counts  include  [King, 
1987;  Oberski,  1988;  OTA,  1986;  White,  1989]: 

A)  Publication  counts: 

1.  indicates  quantity  of  output,  not  quality; 

2.  non- journal  methods  of  communication  ignored; 

3.  publication  practices  vary  across  fields,  journals, 
employing  institutions; 

4.  choice  of  a  suitable,  inclusive  database  is  problematical; 

5.  undesirable  publishing  practices  (artificially  inflated 
numbers  of  co-authors,  artificially  shorter  papers)  increasing. 

B)  Citations: 
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1.  intellectual  link  between  citing  source  and  reference 
article  may  not  always  exist; 

2.  incorrect  work  may  be  highly  cited; 

3.  methodological  papers  among  most  highly  cited; 

4.  self-citation  may  artificially  inflate  citation  rates; 

5.  citations  lost  in  automated  searches  due  to  spelling 
differences  and  inconsistencies; 

6.  Science  Citation  Index  (SCI)  changes  over  time; 

7.  SCI  biased  in  favor  of  English  language  journals; 

8.  same  problems  as  publication  counts. 

In  addition,  one  of  the  main  concerns  with  using  citations  as 
a  stand-alone  measure  of  quality  and  impact  has  been  the  potential 
bimodal  interpretation  of  the  numerical  results.  A  paper  could 
receive  high  citations  because  of  its  high  quality,  or  because  the 
citers  disagree  with  it.  However,  there  is  a  third  interpretation 
which  further  precludes  citations  being  utilized  in  stand-alone 
mode,  which  the  author  has  termed  the  "Pied  Piper”  effect. 

Assume  there  is  a  present-day  mainstream  approach  in  a 
specific  field  of  research;  for  example,  the  chemical/  radiation/ 
surgical  approach  to  treating  cancer  (See  section  IV-C-3  for  a  more 
detailed  example  of  the  ”Pied  Piper  Effect”) .  Assume  that  in,  say, 
fifty  years  a  cure  for  cancer  is  discovered,  and  the  curative 
approach  has  nothing  to  do  with  today's  research.  In  fact,  assume 
it  turns  out  that  today's  approach  was  completely  orthogonal  or 
even  antithetical  to  the  correct  approach.  Then  what  meaning  can 
be  ascribed  to  research  papers  in  cancer  today  which  are  highly 
cited  for  supposedly  positive  reasons? 

In  this  case,  a  paper's  citations  are  a  measure  of  the  extent 
to  which  the  paper ' s  author  has  persuaded  the  research  community 
that  the  research  direction  contained  in  his  paper  is  the  correct 
one,  and  not  a  measure  of  the  intrinsic  correctness  of  the  research 
direction.  In  fact,  the  citations  may  reflect  the  desire  of  a 
closed  research  community  (the  author  and  the  citers)  to  persuade 
a  larger  community  (which  could  include  politicians  and  other 
resource  allocators)  that  the  research  direction  is  the  correct 
one.  This  is  the  "Pied  Piper”  effect.  The  large  number  of 
citations  in  the  above  hypothetical  medical  example  becomes  a 
measure  of  the  extent  of  the  problem,  the  extent  of  the  diversion 
from  the  correct  path,  not  the  extent  of  progress  toward  the 
solution.  The  "Pied  Piper”  effect  is  a  key  reason  why,  especially 
in  the  case  of  revolutionary  research,  citations  and  other 
quantitative  measures  must  be  part  of  and  subordinate  to  a  broadly 
constituted  peer  review  in  any  credible  evaluation  and  assessment 
of  research  impact  and  quality. 

There  are  few  Federally-supported  bibliometric  studies 
reported  in  the  literature.  In  addition  to  the  above  problems, 
another  reason  for  limited  Federal  use  can  be  inferred  from  Narin 
[1976],  where  studies  on  the  publication  and  citation  distribution 
functions  for  individuals  are  reviewed.  The  conclusion  drawn,  from 
studies  such  as  those  of  Lotka,  Shockley,  De  Solla  Price,  and  Cole 
and  Cole,  is  that  very  few  of  the  active  researchers  are  producing 
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the  heavily  cited  papers.  How  motivated  are  funding  agencies  to 
report  these  hyperbolic  productivity  distributions  for  different 
programs  in  the  open  literature,  especially  since  many  questions 
exist  as  to  the  accuracy  and  completeness  of  the  bibliometric 
indicators?  This  conclusion  raises  the  further  question  of  the 
role  actually  played  by  the  less  productive  researchers  (as 
measured  by  publication  and  citation  counts) :  is  the  productivity 
of  the  elite  somehow  dependent  on  the  output  of  the  less 
influential,  or  is  the  role  of  the  less  productive  members  that  of 
maintaining  the  stability  of  the  research  infrastructure  and 
educating  future  generations  of  researchers? 

Macroscale  bibliometric  studies  characterize  science  activity 
at  the  national  [e.g.,  Hicks,  1986;  Braun,  1989],  international, 
and  discipline  level.  The  biennial  Science  and  Engineering 
Indicators  report  [NSF,  1989]  tabulates  data  on  characteristics  of 
personnel  in  science,  funds  spent,  publications  and  citations  by 
country  and  field,  and  many  other  bibliometric  indicators.  Another 
study  at  the  national  level  was  aimed  at  evaluating  the  comparative 
international  standing  of  British  science  [Martin,  1990].  Using 
publication  counts  and  citation  counts,  the  authors  evaluated 
scientific  output  of  different  countries  by  technical  discipline  as 
a  function  of  time.  Much  more  understanding  is  required  as  to 
which  indicators  are  appropriate  and  how  they  should  impact 
allocation  decisions. 

There  have  been  numerous  microscale  bibliometric  studies 
reported  in  the  literature  [e.g..  Frame,  1983;  McAllister,  1983; 
Mullins,  1987,  1988;  Moed,  1988;  Irvine,  1989;  Van  Raan,  1989; 
Luukkonen,  1990a,  1990b,  1992].  The  NIH  bibliometric-based 
evaluations  [OTA,  1986]  included  the  effectiveness  of  various 
research  support  mechanisms  and  training  programs,  the  publication 
performance  of  the  different  institutes,  the  responsiveness  of  the 
research  programs  to  their  congressional  mandate,  and  the 
comparative  productivity  of  NIH-sponsored  research  and  similar 
international  programs . 

Two  papers  [Narin,  1987b,  1989]  described  determination  of 
whether  significant  relationships  existed  among  major  cancer 
research  events,  funding  mechanisms,  and  performer  locations; 
compared  the  quality  of  research  supported  by  large  grants  and 
small  grants  from  the  National  Institute  of  Dental  Research; 
evaluated  patterns  of  publication  of  the  NIH  intramural  programs  as 
a  measure  of  the  research  performance  of  NIH;  and  evaluated  quality 
of  research  as  a  function  of  size  of  the  extramural  funding 
institution.  Most  of  the  NIH  studies  focused  on  aggregated 
comparison  studies  (large  grants  vs  small,  large  schools  vs  small 
schools,  domestic  vs  foreign,  etc). 

Patent  citation  analysis  has  the  potential  to  provide  insight 
to  the  conversion  of  science  to  technology  [Carpenter,  1983;  Narin, 
1984;  Wallmark,  1986;  Collins,  1988;  Narin,  1988c;  Van  Vianen, 
1990;  Narin,  1992].  Much  of  the  Federal  government  support  of  the 
development  of  patent  citation  analysis  was  by  the  NSF  [e.g.. 
Carpenter,  1980;  Narin,  1987a].  Some  recent  studies  have  focused 
on  utilization  of  patent  citation  analysis  for  corporate 
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intelligence  and  planning  purposes  (e.g.,  Narin,  1992b).  Some  of 
the  data  presented  verify  further  Lotka's  Productivity  Law,  where 
relatively  few  people  in  a  laboratory  are  producing  Idrge  numbers 
of  patents.  In  the  example  presented  in  Narin  [1992b],  the  patents 
of  the  most  productive  inventor  are  highly  cited,  further 
demonstrating  his  key  importance.  Narin  concludes  that  highly 
productive  research  labs  are  built  around  a  small  number  of  highly 
productive,  key  individuals. 

Despite  its  limitations,  bibliometrics  may  have  utility  in 
providing  insight  into  research  product  dissemination.  For 
laboratories,  these  studies  include; 

1.  Examine  distribution  of  disciplines  in  co-authored  papers, 
to  see  whether  the  multidisciplinary  strengths  of  the  lab  are  being 
utilized  fully; 

2 .  Examine  distribution  of  organizations  in  co-authored 
papers,  to  determine  the  extent  of  lab  collaboration  with 
universities/  industry/  other  labs  and  countries; 

3.  Examine  nature  (basic/  applied)  of  citing  journals  and 
other  media  (patents),  to  ascertain  whether  lab's  products  are 
reaching  the  intended  customer (s) ; 

4.  Determine  whether  the  lab  has  its  share  of  high  impact 
(heavily  cited)  papers  and  patents,  viewed  by  some  analysts  as  a 
requirement  for  technical  leadership; 

5.  Determine  which  countries  are  citing  the  lab's  papers  and 
patents,  to  see  whether  there  is  foreign  exploitation  of  technology 
and  in  which  disciplines; 

6.  Identify  papers  and  patents  cited  by  the  lab's  papers  and 
patents,  to  ascertain  degree  of  lab's  exploitation  of  foreign  and 
other  domestic  technology. 

A  recent  comparative  bibliometric  analysis  of  53  laboratories 
[Miller,  1992]  clustered  the  labs  into  six  types  (Regulation  and 
Control,  Project  Management,  Science  Frontier,  Service,  Devices, 
Survey) ,  and  stated  that  "comparisons  of  scientific  impacts  should 
be  made  only  with  laboratories  that  are  comparable  in  their  primary 
task  and  research  outputs".  The  report  concluded  further  that: 

1.  Bibliometric  indicators  and  scientific  publications  are  not 
the  only  outputs  that  should  be  measured,  but  the  other  types  of 
outputs  differ  for  different  labs; 

2.  Bibliometric  indicators  are  not  equally  valid  across 
different  types  of  laboratories; 

3.  Bibliometric  indicators  are  less  useful  for  the  evaluation 
of  research  laboratories  involved  in  closed  publication  markets. 

Potential  Normalization  Approaches 

A  major  problem  with  bibliometrics  is  comparisons  of  outputs 
of  different  performers  (or  performing  organizations)  who  may  also 
work  in  different  disciplines.  Three  types  of  normalization 
solutions  to  allow  cross-organization  or  cross-discipline 
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comparisons  are  proposed  by  Schubert  [1993].  In  addition,  the 
author  has  recently  generated  a  new  approach  for  comparing  citation 
rates  across  different  disciplines  [Kostoff ,  1997m] ,  and  excerpts 
are  contained  in  Section  IV-C-2. 

1.  The  Publishing  Journal  as  Reference  Standard 

By  relating  the  number  of  citations  received  by  a  paper  (or 
the  average  citation  rate  of  a  subset  of  papers  published  in  the 
same  journal  -  the  Mean  Observed  Citation  Rate)  to  the  average 
citation  rate  of  all  papers  in  the  journal  (the  Mean  Expected 
Citation  Rate)  the  Relative  Citation  Rate  will  be  obtained.  This 
indicator  shows  the  relative  standing  of  the  paper  (or  set  of 
papers)  in  question  among  its  close  companions;  it  value  is 
higher\lower  than  unity  as  the  sample  is  more\less  cited  than  the 
average . 

2.  The  Set  of  Related  Records  as  Reference  standard 

"Bibliographic  Coupling"  uses  the  nxomber  of  references  a  given 
pair  of  documents  have  in  common  to  measure  the  similarity  of  their 
subject  matter.  Comparing  a  set  of  papers  that  are  "similar"  in 
this  sense  to  a  given  article  of  the  same  age  will  yield  an  ideal 
reference  standard  for  citation  assessments. 

3.  The  Set  of  Cited  Journals  as  Reference  Standard 

A  promising  method  is  based  on  the  journal  in  the  reference 
lists  of  the  articles  of  the  journal  in  question.  These  journals 
are  selected  by  the  most  reliable  persons,  the  authors  of  the 
journal  as  references  (in  both  senses  of  the  word)  and  therefore, 
can  justly  be  regarded  as  standards  of  the  expected  citation  rate. 

CO-OCCURRENCE  PHENOMENA 

One  class  of  computer-based  analytic  techniques  which  tends  to 
focus  more  on  macroscale  impacts  of  research  exploits  the  use  of 
co-occurrence  phenomena.  In  co-occurrence  analysis,  phenomena  that 
occur  together  frequently  in  some  domain  are  assumed  to  be  related, 
and  the  strength  of  that  relationship  is  assumed  to  be  related  to 
the  co-occurrence  frequency.  Networks  of  these  co-occurring 
phenomena  are  constructed,  and  then  maps  of  evolving  scientific 
fields  are  generated  using  the  link-node  values  of  the  networks. 
Using  these  maps  of  science  structure  and  evolution,  the  research 
policy  analyst  can  develop  a  deeper  understanding  of  the 
interrelationships  among  the  different  research  fields  and  the 
impacts  of  external  intervention,  and  can  recommend  new  directions 
for  more  desirable  research  portfolios.  These  techniques  are 
discussed  in  more  detail  in  Kostoff  [1992a-  Appendix  III,  1993c, 
1994f] ;  Tijssen  [1994].  The  Tijssen  paper  contains  an  excellent 
exposition  on  mapping  techniques  for  displaying  the  structure  of 
related  science  and  technology  fields. 

In  particular,  co-citation  analysis  has  been  applied  to 
scientific  fields,  and  co-citation  clusters  have  been  mapped  to 
represent  research-front  specialties  [Tijssen,  1994].  Co-word  has 
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been  utilized  to  map  the  evolution  of  science  under  European 
(mainly  French)  government  support,  and  has  the  potential  to 
supplement  other  research  impact  evaluation  approaches.  Co¬ 
nomination,  in  its  different  incarnations,  has  been  used  to 
construct  social  networks  of  researchers  and  has  the  potential,  if 
expanded  to  include  research  and  technology  impacts  in  the  network 
link  values,  for  evaluating  direct  and  indirect  impacts  of 
research.  Co-classification  is  based  on  co-occurrences  of 
classification  codes  in  patents,  and  is  used  to  construct  maps  of 
technology  clusters  [Engelsman,  1991] . 

COST-BENEFIT/  ECONOMIC  ANALYSES 

A  comprehensive  survey  examined  the  application  of  economic 
measures  to  the  return  on  research  and  development  as  an  investment 
in  individual  industries  and  at  the  national  level  [OTA,  1986]. 
This  document  concluded  that  while  econometric  methods  have  been 
useful  for  tracking  private  R&D  investment  within  industries,  the 
methods  failed  to  produce  consistent  and  useful  results  when 
applied  to  Federal  R&D  support.  A  more  recent  analysis  focused  on 
economic/  cost-benefit  approaches  used  for  research  evaluation 
[Averch,  1994].  The  methods  involve  computing  impacts  using  market 
information,  monetizing  the  impacts,  then  comparing  the  value  of 
the  impacts  with  the  cost  of  research.  Principal  measures 
described  include  surplus  measures  and  productivity  measures.  With 
known  benefit  and  cost  time  streams,  internal  rates  of  return  to 
R&D  investments  are  then  computed.  The  paper  notes  both  the 
standard  technical  difficulties  with  these  approaches  and  the 
political  and  organizational  difficulties  in  implementing  them. 

Cost-Benefit 

Cost-benefit  analyses  are  a  family  of  related  techniques  which 
include  Cost-Benefit,  Net  Present  Value,  and  Rate-of-Return  [Link, 
1993;  Roessner,  1993;  Averch,  1994].  These  approaches  tend  to  be 
more  widely  used  in  industry  than  government.  For  one,  or  many, 
projects,  the  basic  approach  is  similar.  A  starting  point  in  time 
for  the  research  is  defined.  The  time  stream  of  costs  for  product 
development  is  estimated,  and  the  time  stream  of  benefits  from  the 
product  is  estimated.  Using  the  time  value  of  money,  the  costs  and 
benefits  are  discounted  to  the  origin  of  time,  and  the  net  benefits 
are  compared  with  the  net  costs.  The  main  differences  in  the 
approaches  to  cost  benefit  analyses  are  in  the  sophistication  of 
the  methods  used  to  estimate  the  cost  and  benefit  streams,  and  the 
time  value  of  money. 

Cost-benefit  analyses  have  limited  accuracy  when  applied  to 
basic  research  because  of  the  quality  of  both  the  cost  and  benefit 
data  due  to  the  large  uncertainties  characteristic  of  the  research 
process,  as  well  as  selection  of  a  credible  origin  of  time  for  the 
discounting  computations.  As  an  illustrative  example,  a 
deterministic  cost-benefit  analysis  was  performed  by  the  author  on 
a  fusion  reactor  variant  [Kostoff,  1983].  Its  real  problem,  which 
pervades  and  limits  any  attempt  to  perform  a  cost-benefit  analysis 
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on  a  concept  in  the  basic  research  stage,  was  the  inherent 
uncertainty  of  controlling  the  fusion  process.  This  translated  to 
the  inedsility  to  predict  the  probzQsilities  of  success  and  time  and 
cost  schedules  for  overcoming  fundamental  plasma  research  problems 
(e.g.,  plasma  steO^ilities  and  confinement  times);  no  credible 
methods  were  availedsle.  Thus,  the  main  value  of  the  cost-benefit 
approach  was  to  show  that  the  potential  existed  for  positive  payoff 
from  the  hybrid  reactor  development,  that  there  was  a  credible 
region  in  parameter  space  in  which  controlled  fusion  development 
could  prove  cost  effective;  what  was  missing  was  the  likelihood  of 
achieving  that  payoff. 

A  1991  marginal  cost-benefit  study  weighed  the  costs  of 
academic  research  against  the  benefits  realized  from  the  earlier 
introduction  of  innovative  products  and  processes  due  to  the 
academic  research  [Mansfield,  1991].  The  study  used  survey  data  to 
show  a  very  high  social  rate  of  return  resulting  from  academic 
research.  While  the  method  is  innovative,  future  applications 
using  more  objective  data  sources  would  provide  higher  confidence 
in  the  computed  rates  of  return. 

Production  Function 

Production  function  approaches  to  evaluating  research  returns 
invoke  economic  theory-based  assvamptions  relating  outputs  to  inputs 
to  generate  an  estimatable  model.  One  only  needs  time  series  data 
on  output,  capital,  labor,  and  research  expenditures  to  estimate 
empirically  the  marginal  contribution  of  research  to  value  added. 
However,  the  relationship  of  research  to  value  added  is  non-linear 
and  indirect.  Variables  such  as  other  inputs  to  technology  and 
production  and  marketing  functions  complicate  the  research/  value 
added  relationship. 

Much  of  the  major  recent  economic  work  relating  economic 
growth/  productivity  increases  to  R&D  spending  has  been  performed 
by  three  economists  [Mansfield,  1980,  1991;  Terleckyj ,  1977,  1985; 
Griliches,  1979,  1994].  Mansfield's  earlier  study  typifies  the 
strengths  and  weaknesses  of  the  production  function  approach.  This 
study  [Mansfield,  1980]  attempted  to  determine  whether  an 
industry's  or  firm's  rate  of  productivity  change  was  related  to  the 
amount  of  basic  research  it  performed.  Mansfield  developed  a 
production  function  which  disaggregated  basic  and  applied  research, 
then  regressed  rate  of  productivity  increase  with  many  different 
variables.  The  regressions  showed  a  strong  relationship-  between 
the  amount  of  basic  research  carried  out  by  an  industry  and  the 
industry's  rate  of  productivity  increase  during  1948-1966. 

The  study  exemplifies  the  problem  inherent  in  multiple 
regression  analyses:  that  of  determining  cause  and  effect  from  what 
is  essentially  correlation.  As  Mansfield  points  out,  "It  is 
possible  that  industries  and  firms  with  high  rates  of  productivity 
growth  tend  to  spend  relatively  large  amounts  on  basic  research, 
but  that  their  high  rates  of  productivity  growth  are  not  due  to 
these  expenditures"  [Mansfield,  1980].  Nor  does  Mansfield's  model 
specify  the  path(s)  by  which  R&D  investment  supposedly  leads  to 
productivity  improvements. 
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A  production  function  approach  to  cost-efficiency  of  basic 
research  essentially  used  a  regression  analysis  between  outputs  and 
inputs  [Averch,  1987,  1989].  For  proposals,  the  method  involved 
regressing  output  variables  (citations  per  dollar,  graduate 
students  per  dollar)  against  input  variables  (e.g.,  quality  of  the 
investigator's  department,  quality  of  the  investigator,  etc.).  The 
results  gave  some  idea  of  the  importance  of  the  input  variables, 
alone  or  in  combination,  on  the  output  variables.  One  obvious 
potential  application  would  be  prediction  of  proposals  likely  to 
have  high  productivity  based  on  prior  (input)  knowledge.  Much, 
however,  remains  to  be  done  in  identifying  the  appropriate  output 
measures,  the  appropriate  input  measures,  and  the  nature  of  the 
interactions  among  these  measures  for  different  disciplines. 

NETWORK  MODELING  FOR  DIRECT/ INDIRECT  IMPACTS 

A  network  based  modeling  approach  was  devised  which  would 
allow  estimation  of  the  direct  and  indirect  impacts  of  a  research 
program  or  collection  of  research  programs.  The  research  program 
impacts  would  be  multi-faceted,  including  impacts  on  advancing  its 
own  field,  on  advancing  allied  fields,  on  advancing  technology,  on 
supporting  operations  and  mission  requirements,  etc.  A  major 
feature  of  the  model  is  Inclusion  of  feedback  from  the  higher 
development  categories  (e.g.,  exploratory  development,  advanced 
development)  on  the  advancement  of  research. 

The  model  and  a  subsequent  pilot  study  related  to  Navy  R&D 
have  been  described  in  detail  [Kostoff,  1994i].  In  summary,  a 
network  was  constructed  in  which  each  node  represented  an  area  of 
research  or  development.  The  values  of  the  links  connecting  each 
node  pair  represented  the  impact  of  results  from  the  first  node 
area  on  the  second  node  area.  The  total  impact  of  an  area  -  of 
research  on  other  research  or  development  was  obtained  by 
integrating  over  all  paths  from  the  research  node  to  the  node(s)  of 
interest. 


EXPERT  NETWORKS 

Research  Impact  Assessment  is,  at  its  essence,  a  diagnostic 
process  with  many  diagnostic  tools.  In  other  fields  of  endeavor, 
such  as  Medicine  and  Machinery  Repair,  expert  systems  are 
increasingly  being  used  as  diagnostic  tools  or  as  support  to 
diagnostic  processes.  Recently,  there  have  been  efforts  to  develop 
expert  system  approaches  combined  with  artificial  neural  networks 
(expert  networks)  for  use  in  R&D  management,  including  RIA 
[Odeyale,  1993;  Odeyale  and  Kostoff,  1994a,  1994b].  A  brief 
summary  of  these  efforts  follows. 

The  product  of  these  efforts  is  Research-Management  Expert 
Network  (R-MEN)  which  is  characterized  by  two  complementary  tools: 
Organizational/Professional  Development  and  Expert  Network.  The 
latter  technology  is  comprised  of  an  expert  system  (left  side 
brain)  and  an  artificial  neural  network  (right  side  brain) .  Given 
a  set  of  research,  and  research  management  policies  and  strategies. 


34 


R-MEN  learns  concepts  that  hierarchically  organize  those  policies 
and  strategies  and  use  them  in  classifying/triaging  research 
proposals. 

The  framework  of  Research-Management  Expert  Network 
(R-MEN)  consists  of  a  knowledge  base  and  a  data  base.  Feeding  into 
the  knowledge  base  are  four  modules:  a  policy/  strategy  impartation 
module  and  a  proposal  data  acquisition  module,  both  of  which 
receive  input  from  the  0/PD  process;  and  a  research  impact 
calculation  module  and  a  proposal  review  module.  The  knowledge 
base  then  feeds  into  the  data  base  through  five  modules;  a  project 
selection  module,  resources  allocation  module,  project  evaluation 
and  control  module,  investigator  evaluation  module,  and 
organization  evaluation  module. 

R-MEN  is  implemented  in  three  phases.  Phase  1  includes  the 
development  of  the  strategic  plan,  which  defines  and  communicates 
longer-term  research  directions,  and  the  development  of  the 
operating  plan,  which  specifically  identifies  the  projects  that 
will  implement  the  strategic  plan  taking  into  consideration  the 
goals,  quantifiable  objectives  and  development  of  the  individual 
investigator  and  the  organization. 

Phase  2  represents  the  necessary  education,  and  management 
support  needed  to  prepare  the  staff  to  participate  in  such  an 
"Action  Research"  effort.  This  phase  identifies  and  utilizes  the 
critical  components  required  to  develop  an  environment  that 
facilitates  participative  research  management  activities.  A 
significant  activity  occurring  during  this  phase  is  daily 
verification  of  individual  scheduled  training  and  development.  If 
an  individual  has  no  recorded  training  and/or  development  within  a 
preset  period,  the  system  will  generate  and  send  a  report  through 
E-mail  directly  to  the  office  of  the  director  for  R&D.  The  system 
will  be  able  to  look  at  a  training  and/or  development 
description (s)  and  compare  it/them  with  the  background  of  the 
individual  to  determine  if  the  training  and/or  development  is/are 
suitable  for  that  individual. 

Phase  3  represents  a  means  by  which  participative  methods  can 
be  put  into  operation  in  developing  productivity  tracking  systems. 
Significant  activities  occurring  during  this  phase  include  project 
evaluation  and  control.  This  entails  periodic  monitoring  of 
project  milestones  for  applied  research,  and  research  objectives 
for  the  more  basic  research.  If  a  project  has  no  recorded 
fulfillment  of  a  milestone  within  a  preset  period,  the  system  will 
generate  and  send  a  report  through  E-mail  directly  to  the  office  of 
the  director  for  R&D. 

If  R-MEN  is  initially  used  concurrently  with  present  research 
review  processes,  it  will  serve  as  a  supplement  in  the  form  of  a 
guide  to  data  generation,  acquisition  and  processing,  and  a 
validity  check.  With  appropriate  implementation  and  maintenance, 
this  knowledge  technology,  which  utilizes  demonstrated  and  proven 
approaches,  methods,  procedures  and  techniques  in  an  innovative  and 
unique  way,  could  lead  to  the  following  benefits: 

1.  Provide  a  means  for  effective,  policy-  and  strategy- 
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oriented  management  through  outcomes-management . 

2.  Improve  management  quality,  reduce  operation  costs,  and 
increase  productivity  and  public  trust. 

3 .  Foster  impact  evaluation  to  document  Federally  funded 
program  and  management  effectiveness. 

4.  Provide  short-term  (three-year)  program  progress  tracking 
and  long-term  (ten-year)  result (s)  impact  tracking. 

5.  Shield  administrators,  managers,  and  other  policy-makers 
from  the  complexity  of  the  mathematics  of  the  inference  machine. 

6.  Permit  the  evaluation  of  a  range  of  alternatives. 

7.  Permit  handling  large  amounts  of  data. 

8.  Permit  policy-makers  to  have  a  better  understanding  of 
existing  technical  attributes  of  and  capabilities  for  potential 
projects. 

9.  Facilitate  choice  of  strategy  compatible  with  agency 
structure  and  processes,  and  with  the  policy  or  the  nature  of 
decision  making  for  activities  scheduling  and  control. 

QUANTITATIVE  METHODS  CONCLUSIONS 

Bibliometric  methods  are  valuable  in  quantifying  the  output  of 
research.  Because  they  do  not  address  quality,  and  their  numeric 
outputs  are  subject  to  multiple  interpretations,  they  are  not  self- 
contained  assessment  methods.  They  are  a  valuable  supplement  to 
the  subjective  interpretative  methods  such  as  peer  review. 

Economic  approaches  have  limited  value  when  applied  to 
assessing  the  potential  of  fundamental  research,  because  of  the 
uncertain  nature  of  the  data.  Their  validity  increases  as  the 
research  becomes  more  applied,  and  cost  and  benefit  streams  can  be 
estimated  more  accurately. 

As  databases  become  more  extensive,  and  computer  power 
continually  increases,  data  intensive  quantitative  analyses  will 
increase  in  use.  Approaches  such  as  co-occurrance,  network 
modeling,  and  expert  networks  described  above  will  become  more 
commonplace  in  research  assessment. 

For  those  fields  of  technology  in  which  patents  are  an 
important  mode  of  communication,  patent  citation  analysis  offers 
insight  into  the  conversion  of  science  to  technology.  Many  of  the 
reported  patent  citation  analysis  studies  tend  to  focus  on 
technical  intelligence  for  corporate  applications  [Narin,  1994]. 

RESEARCH  REQUIREMENTS  FOR  RIA  SUMMARY 

More  retrospective  studies  are  required  using  modern 
technologies  such  as  information  processing  and  computerized 
citation  databases.  The  tracing  of  the  indirect  impacts  of 
research  should  be  emphasized.  Network  approaches  are  valuable  in 
this  regard.  More  rigorous  peer  review  experiments  should  be 
performed,  to  understand  better  the  issues  of  cost,  validity, 
reliability,  quality,  and  feedback.  The  text  describes  the  main 
parameters  to  be  examined  in  these  studies.  For  bibliometrics, 
studies  are  required  to  address  the  normative  comparisons  across 
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different  disciplines,  as  well  as  to  examine  optimal  ways  to 
combine  multiple  indicators  into  few  figures  of  merit. 

Central  to  the  assessment  of  research  is  the  capability  to 
handle  all  phases  of  the  information  creation,  flow,  and 
integration  cycle.  The  explosion  of  available  information  in  the 
last  decade  requires  the  utilization  of  large  databases  to  handle 
this  information  in  support  of  RIA. 

In  particular,  sophisticated  data  collection,  analysis,  and 
interpretation  schemes  can  track  the  dissemination  of  information 
flowing  from  research  to  other  applications.  A  credible  research 
product  tracking  scheme  can  help  identify  the  indirect  impacts  of 
research  more  precisely,  and  can  improve  correlations  between 
research  evaluation  predictions  (such  as  peer  review  and 
bibliometrics)  and  downstream  impacts. 

Central  to  credible  work  in  predicting  and  tracking  the 
diffusion  of  information  from  research  is  a  database  of  research 
products  at  various  evolutionary  stages  which  can  feed  the 
predictive  models.  This  database  of  research  products  could  be 
linked  in  part  with  databases  of  sponsored  research  and  technology. 
Since  the  research  product  evolutionary  pathways  transcend  the 
research  originating  organization,  and  can  intersect  all  societal 
sectors,  the  cooperation  of  many  public  and  private  organizations 
would  be  required  to  develop  a  database  of  research  products  in 
their  evolutionary  stages.  Development  and  construction  of  such  a 
database  should  start  now. 

Comprehensive  databases  describing  sponsored  research  and 
development  programs  in  many  funding  agencies  and  organizations, 
with  sophisticated  software  to  provide  rapid  access  to  the  database 
contents,  can  help  improve  the  selection,  management,  and 
evaluation  of  research  programs.  Research  gaps  can  be  identified, 
duplication  of  programs  can  be  minimized,  complementary  and  joint 
programs  can  be  established,  substantial  leveraging  of  other  agency 
programs  can  be  implemented,  and  technology  planning  can  be 
improved  with  better  awareness  of  maturing  research  programs. 

To  fully  understand  a  research  program,  especially  in  the 
assessment  of  that  program,  evaluators  must  be  cognizant  of  the 
large  body  of  research  being  conducted  throughout  the  world.  In 
addition,  to  fully  understand  the  impacts  of  research  on  different 
technologies,  evaluators  must  be  cognizant  of  the  large  body  of 
existing  and  developmental  technology  throughout  the  world,  and  the 
existing  and  potential  shortcomings  in  those  technologies. 

With  the  advent  of  high  speed  and  high  storage  capacity 
computers,  and  advances  in  database  software  packages,  the 
capability  exists  now  to  make  large  amounts  of  information 
available  to  researchers  and  evaluators.  In  particular,  the 
capability  exists  to  provide  information  about  funded  research  and 
technology  development  programs  being  conducted  throughout  the 
world,  as  well  as  information  about  existing  technologies. 

Tailored  databases  which  contain  information  about  the 
structural  relationships  among  projects  and  programs  can  help 
identify  critical  paths  for  development  in  R&D  programs.  This  is 
important  in  allocating  resources  among  programs  in  mission- 
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oriented  agencies  and  other  organizations. 

Sophisticated  algorithms  for  manipulating  and  interpreting 
large  technical  textual  databases  would  allow  pervasive  themes  of 
the  databases  to  be  identified,  as  well  as  the  relationships  among 
the  themes  and  sub-themes.  Low  frequency  anomolous  relationships 
which  could  be  important  are  identified  easily  with  these 
techniques.  The  algorithms  would  also  allow  identification  of  the 
translations  between  research  areas  and  technology  areas  in  the 
databases,  and  would  provide  guidelines  and  roadmaps  for  increasing 
the  efficiency  of  searching  unfamiliar  databases. 

These  algorithms,  and  subsequent  analyses,  have  the  potential 
of  identifying  emerging  research  and  development  areas  contained 
within  the  databases  but  not  readily  discernable.  The  software  can 
also  help  in  taxonomy  construction,  with  the  taxonomy  elements 
obtained  'bottom-up'  from  the  database  language,  rather  than  top 
down  using  an  authoritative  directed  approach.  Many  different 
types  of  taxonomies  could  be  constructed  from  the  full  text 
database,  and  relationships  among  the  different  elements  of  the 
different  taxonomies  could  be  obtained.  Finally,  by  looking  at  the 
changes  in  the  structure  of  research  fields  over  time,  the  impact 
of  sponsoring  organization  intervention  can  be  ascertained. 

RIA  OPTIONS  SUMMARY 

This  final  Handbook  section  spans  topics  ranging  from 
investment  strategy  and  evaluation  protocols  to  algorithms  for 
allocating  funds  based  on  quantitative  evaluation  results.  In  this 
section,  the  research  evaluation  guidance  recommended  for  Federal 
agencies  is  described.  While  the  focus  of  this  section  is  on 
Federal  agency  evaluations,  the  principles  and  implementation 
mechanics  are  sufficiently  broad  to  be  applicable  to  most 
organizations. 

III.  INTRODUCTION 

Research  is  the  pursuit  and  production  of  knowledge  by 
the  scientific  method.  Research  Productivity  is  the  generation  of 
tangible  and  intangible  products  from  research.  Research 
Efficiency  is  the  productivity  of  research  per  unit  of  input 
resource.  Research  Impact  is  the  change  effected  on  society  due  to 
the  research  product.  Research  Effectiveness  is  a  measure  of  the 
focus  of  impact  on  desired  goals. 

The  underlying  value  in  the  practice  of  research  is  truth. 
What  achieves  truth  most  efficiently  is  the  most  value.  The 
underlying  value  in  the  administration  of  research  is  utility. 
What  is  most  useful,  in  addition  to  being  true,  is  the  most 
valuable.  In  the  administration  of  basic  research,  usefulness  to 
neighboring  sciences  is  the  main  guiding  criterion.  Those  basic 
scientific  activities  that  impart  unity  to  science  are  to  be 
preferred  to  those  that  do  not.  Unity  is  the  ultimate  value  in  the 
formulation  of  a  grand  strategy  for  basic  research  [Weinberg, 1989] . 

To  measure  the  impact  of  research  requires  the  measurement  of 
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knowledge.  However,  knowledge  cannot  be  measured  directly.  What 
can  be  observed  and  measured  are  the  expressions  of  knowledge,  such 
as  papers,  patents,  and  students  educated.  Measures  of  the 
expressions  of  knowledge  resulting  from  research  must  of  necessity 
provide  an  incomplete  picture  of  the  research  product.  The 
concluding  hypothesis  that  will  permeate  the  remainder  of  this 
Handbook  is  that  the  greater  the  variety  of  measures  and 
qualitative  processes  used  to  evaluate  research  impact,  the  greater 
is  the  likelihood  of  converging  to  an  accurate  understanding  of  the 
knowledge  produced  by  research  [Irvine,  1984]. 

Impact  of  a  research  program  involves  identifying  the  variety 
of  expressions  of  knowledge  produced,  as  well  as  the  changes  which 
these  expressions  effect  on  a  multitude  of  different  potential 
research  targets  (other  research  areas,  technology,  systems, 
operations,  other  organizational  missions,  education,  social 
structures,  etc.).  While  some  impacts  may  be  tangible  (new 
instruments  developed,  new  research  fields  stimulated,  students 
trained  in  new  disciplines),  many  may  be  intangible  (e.g.,  a 
designer  of  equipment  may  receive  new  insights  from  having  attended 
a  research  seminar) ,  and  difficult  to  identify,  much  less  quantify. 

Evaluation  of  research  impact  is  further  complicated  by  the 
different  perspectives  and  motivations  of  the  assessors.  The 
quantitative  approaches  require  interpretation  by  the  assessors, 
and  the  qualitative  approaches  rest  on  the  purely  subjective 
judgements  of  the  assessors.  The  importance  of  a  research  program 
represents  a  weighting  of  its  quantitative  and  qualitative  impacts 
on  the  different  potential  targets  of  research.  Yet  this  weighting 
is  dependent  on  the  multiple  perspectives  of  the  assessors, 
including  technical,  organizational,  and  personal  perspectives 
[Linstone,  1989],  and  the  interplay  among  these  perspectives  is  not 
always  obvious.  Thus,  not  only  is  the  impact  of  the  research,  on 
each  of  its  potential  targets  dependent  on  some  unknown  function  of 
the  multiple  perspectives  of  the  assessors,  but  the  value  and 
relative  ranking  of  the  targets  depends  on  these  multiple 
perspectives  as  well.  Selection  of  technical  methodologies, 
measures,  and  assumptions  by  the  assessors  may  be  driven 
significantly  by  organizational  and  personal  motivations. 

Understanding  and  measures  of  the  impact  of  research  are 
desired  by  research  sponsors  at  every  stage  of  the  research  cycle, 
including  research  topic  identification,  research  selection, 
research  management  and  evaluation,  and  research  termination/ 
transition  and  retrospective  analysis.  Research  impact  evaluations 
are  of  potential  use  to  sponsors  in;  "Deciding  whether  to  continue 
or  end  the  program  or  to  increase  or  decrease  its  budget;  changing 
the  program,  or  its  management,  to  improve  the  probability  of 
success;  altering  policies  regarding  the  procurement,  conduct,  or 
management  of  research;  and/or,  building  support  with  policy  makers 
and  other  constituencies  of  the  program"  [Salasin,  1980] . 

In  terms  of  actual  use,  a  major  1985  survey  of  strategic 
evaluation  methods  for  research  programs  concluded: 

"Peer  review  is  generally  considered  the  touchstone  of 
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research  program  evaluation  techniques.  Bibliometric  techniques 
have  demonstrated  considerable  utility  for  research  program 
evaluation;  many  studies  recommend  that  they  be  used  in  .conjunction 
with  other  techniques.  Econometric  methods  are  frequently 
propounded,  but  save  for  cost-benefit  analysis,  most  of  these 
techniques  have  not  received  widespread  currency  for  evaluating 
research  programs...  Based  upon  our  review  of  the  literature,  it 
would  appear  that  formal,  strategic  evaluation  of  research  programs 
is  not  performed  on  a  regular  basis  in  either  government  or 
industrial  laboratories.  Government  funding  programs  are  evaluated 
on  an  irregular  basis  as  well.  We  surmise  that  much  evaluation  is 
informal  and  non-technique  oriented  and  hence  not  reported  outside 
of  the  organization  which  conducts  it”  [Kerpelman,  1985]. 

There  are  many  bibliographies  containing  the  large  number  of 
methods  developed  to  evaluate  research  conduct,  impact,  and 
benefits  [Wirt,  1974;  Salasin,  1980;  Gibbons,  1985;  Logsdon,  1985; 
Kerpelman,  1985;  OTA,  1986;  Gibbons,  1987;  Luukkonen-Gronow ,  1987; 
Averch,  1990;  Hall,  1990;  Johnston,  1990;  OTA,  1991;  Kostoff, 
1991c,  1991e,  1992a,  1993b,  1992d,  19941].  A  relatively  small 
fraction  of  the  methods  are  actually  used  in  practice  by  Federal 
research  sponsors  and  evaluators.  Of  those  used  in  practice,  only 
a  small  fraction  of  the  results  of  impact  studies  are  reported  in 
the  published  literature,  and  an  even  smaller  fraction  are  accepted 
by  the  final  Federal  decision-makers.  While  a  number  of  the 
methods  in  practice  actually  used  by  Federal  research  sponsors  to 
measure  impact  will  be  described  in  the  remainder  of  this  Handbook, 
one  objective  will  be  to  focus  on  the  strengths  and  weaknesses  of 
these  selected  methods. 

IV.  RESEARCH  IMPACT  EVALUATION  TECHNIQUES 

Luukkonen-Grunow  [1987]  and  Averch  [1990]  provide  summaries  of 
major  research  evaluation  methods  used  throughout  the  world.  The 
three  main  categories,  in  frequency  of  usage  order,  are:  Peer 
Review,  Non-Quantitative  Case  Study  and  Anecdotal  Approaches,  and 
Quantitative  Methods.  Specific  variants  of  the  qualitative,  semi- 
quantitative,  and  quantitative  methods  are  described,  and  examples 
of  the  more  prominent  applications  in  the  U.  S.  are  presented. 

IV-A.  QUALITATIVE  METHODS  (PEER  REVIEW) 

IV-A-1.  PEER  REVIEW  BACKGROUND  AND  ISSUES 
Introduction 

Peer  review  of  research  is  overwhelmingly  the  method  of  choice 
in  practice  in  the  U.  S.,  as  well  as  the  rest  of  the  world 
[Salasin,  1980;  Logsdon,  1985;  Chubin,  1990;  Chubin,  1994;  Kostoff, 
1995a;  Stamps,  1997a].  Its  objectives  range  from  being  an 
efficient  resource  allocation  mechanism  to  a  credible  predictor  of 
research  impact.  Due  to  the  pressures  for  increased  accountability 
of  federal  research  expenditures,  there  is  the  potential  for  use  of 


40 


research  program  peer  review  to  increase  substantially  in  the  near 
future.  Before  the  peer  review  specifics  are  addressed,  reasons 
for  the  potential  increased  use  will  be  discussed. 

In  1993,  the  Government  Performance  and  Results  Act  (GPRA)  was 
enacted  into  law  [GPRA,  1993].  GPRA  applies  to  all  federal  outlay 
programs,  and  has  three  components;  strategic  plans,  annual 
performance  plans,  and  metrics  to  show  how  well  the  annual  plans 
are  being  met.  Since  the  plan  became  law,  there  have  been  many 
federal  interagency  meetings  to  ascertain  how  the  third  requirement 
of  the  plan,  performance  metrics,  could  be  implemented  to  properly 
portray  the  progress  and  accomplishments  of  research,  especially 
basic  research.  The  emerging  consensus  from  the  basic  research 
sponsor  and  performer  communities  is  that  there  exists  a  major 
mismatch  between  the  stated  requirements  of  GPRA  and  what  is 
required  to  determine  the  health  of  a  research  program. 

However,  the  GPRA  legislation  states  that  if  "it  is  not 
feasible  to  express  the  performance  goals  for  a  particular  program 
activity  in  an  objective,  quantifiable,  and  measureable  form,  the 
Director  of  the  Office  of  Management  and  Budget  may  authorize  an 
alternative  form”  [GPRA,  1993].  In  a  companion  article  in  Science 
[Kostoff ,  1997h] ,  it  is  proposed  that  peer  review  be  used  as  the 
dominant  basic  research  program  health  diagnostic  for  GPRA, 
supplemented  by  bibliometric  and  other  measures.  There  is  a 
growing  consensus  in  the  larger  research  community  that  use  of  peer 
review  is  a  more  appropriate  tool  to  measure  basic  research  program 
performance  in  order  to  satisfy  the  GPRA  requirements.  If  the  GPRA 
oversight  agencies  agree  with  this  philosophy,  then  the  volume  of 
research  program  peer  reviews  across  the  federal  agencies  will 
increase  dramatically. 

However,  not  only  the  volvune  of  program  peer  reviews  will 
change,  but  the  conduct  of  the  reviews  will,  of  necessity,  change. 
If  GPRA  is  fundamentally  a  budgetary  instrxunent  [Brown,  1996],  then 
the  performance  evaluation  results  which  input  to  the  performance 
budgeting  process  must  be  of  the  highest  quality.  The  methods 
chosen  to  obtain  these  performance  evaluation  results,  program  peer 
review  and  the  supplementary  quantitative  performance  measures, 
would  require  more  rigorous  and  standardized  operational 
characteristics  (Process  selection,  reviewer  selection,  etc.). 

The  purpose  of  the  present  section  is  to  bring  to  the 
attention  of  the  relevant  research  sponsoring,  oversight,  managing, 
and  performing  communities  the  underlying  issues  surrounding 
research  program  peer  review.  If  these  issues  can  be  addressed 
comprehensively  prior  to  full  scale  GPRA  implementation,  then 
procedures  could  be  developed  to  conduct  peer  review  in  a  manner 
which  will  not  only  support  the  performance  budgeting  process  but 
could  add  value  to  the  research  program  as  well.  To  insure  that 
the  present  section  reflects  the  experiences  and  findings  of  the 
larger  research  evaluation  community,  principles  and  findings  from 
the  manuscript  and  proposal  peer  review  literature  will  be 
utilized,  where  applicable,  to  illuminate  the  research  program 
review  issues  and  help  bridge  the  gaps  in  the  research  program 
review  literature. 
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There  are  three  major  components  of  the  present  section.  The 
main  body  of  the  text  (IV-A-1)  addresses  the  underlying  issues 
surrounding  research  program  peer  review.  The  next  section  (IV-A- 
2)  summarizes  research  program  peer  review  practices  for  selected 
federal  agencies.  The  final  section  (IV-A-3)  describes  in  detail 
a  peer  review  process  algorithm  which  embodies  the  best  practices 
of  federal  agencies  and  many  of  the  principles  espoused  in  the  main 
body  of  the  present  text.  First,  some  definitions  and  background 
will  be  presented,  to  set  the  stage  for  the  detailed  examination  of 
the  peer  review  issues. 

DEFINITIONS  AND  BACKGROUND 

Research  Program  Definition 

Fiscally,  a  research  program  is  a  collection  of  funded 
research  components.  These  elements  could  be  siibprograms , 
projects,  or  individual  work  units  (Principal  Invest igators-PIs) . 
Conceptually,  a  program  is  greater  than  the  sum  of  its  components, 
just  as  the  living  human  body  is  greater  than  the  sum  of  its 
component  molecules.  A  program  includes  the  intelligence  or 
inherent  logic  which  links  the  components  to  each  other  and  to  the 
program's  overall  objectives,  just  as  the  living  human  body 
includes  the  intelligence  which  links  the  molecules  to  each  other 
and  to  the  homeostatic  operation  of  the  body.  Thus,  the  intrinsic 
quality  of  a  research  program  is  not  merely  the  sum  of  the 
qualities  of  the  component  projects,  but  depends  on  the  quality  of 
the  structural  relationships  among  the  projects  as  well. 

Review  of  a  research  program  can  then  be  viewed  as  consisting 
of  two  elements;  1)  "review  of  a  program  of  research",  which 
examines  the  nature  of  the  component  projects,  and  is  commonly 
referenced  as  an  in-depth  technical  review,  and  2)  "review  of  a 
research  program" ,  which  examines  the  nature  of  the  structural 
relationships  among  the  projects  and  between  the  projects  and  their 
external  environment,  and  is  commonly  referenced  as  a  management 
review.  These  two  elements  could  be  merged  operationally  into  a 
single  review,  or  could  be  performed  separately. 

A  program  could  be  single  research  discipline  intra-  or  inter¬ 
agency;  multiple  discipline  intra-  or  inter-agency;  multiple 
discipline  vertically  integrated  intra-  or  inter-agency;  multiple 
discipline  multi-agency  multi-national;  or  other  variants  of  the 
above.  The  nominal  program  discussed  in  this  section  is  assumed  to 
be  intra-agency;  the  nominal  review  is  assumed  to  be  intra-agency. 
Some  organizations  review  by  disciplines,  some  organizations  review 
by  multi-discipline  management  unit,  and  in  some  organizations 
disciplines  coincide  with  management  units. 

Peer  Review  Definition 

The  classical  definition  of  a  peer  is  "A  person  who  has  equal 
standing  with  another".  A  peer  review,  then,  is  a  review  of  a 
person  or  persons  by  others  of  equal  standing.  The  crucial  issue 
then  becomes  how  'equal  standing'  is  defined. 

Most  research  peer  reviews  with  which  the  author  is  familiar. 
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whether  of  journal  research  manuscripts,  research  proposals  for 
funding,  or  research  project  performance  reviews,  tend  to  employ- 
peer  reviewers  who  are  experts  in  the  specific  research-  area  of  the 
person  or  group  under  review.  Depending  on  the  relative  levels  of 
expertise  between  the  reviewers  and  reviewees,  the  reviewers  may  or 
may  not  be  de  facto  peers.  Applied  to  research  program  review, 
such  experts  are  most  competent  for  the  in-depth  technical  sxibse-t 
defined  above  as  "review  of  a  program  of  research".  The  focus  of 
this  subset  is  on  the  intrinsic  nature  of  the  collection  of 
research  projects  within  the  program,  especially  on  their  quality, 
accomplishments,  ongoing  problems,  unexpected  findings  and 
discoveries . 

The  focus  of  the  management  review  subset  defined  above  as 
"review  of  a  research  program"  is  on  the  structural  relationships 
among  the  research  projects  within  the  program.  This  subset 
addresses  issues  such  as  mission  relevance,  budget  adequacy, 
program  staff,  objectives,  and  procedures.  To  address  the  issues 
of  this  subset,  additional  types  of  peers  to  those  of  the  first 
subset  are  required. 

For  the  purposes  of  the  present  document,  a  more  liberal 
interpretation  of  a  peer  than  normally  employed  will  be  used  to 
encompass  the  requirements  for  addressing  both  subsets  of  research 
program  peer  review.  This  expanded  definition  of  a  peer  describes 
the  types  of  reviewers  that  the  author  has  tended  to  choose  in 
conducting  research  program  peer  reviews  which  combine  both  subsets 
of  program  review  into  a  single  process.  In  this  more  inclusive 
definition,  a  peer  may  be  a  person  expert  in  the  specific  technical 
area  of  the  research  being  reviewed,  in  allied  technical  areas  to 
the  research  being  reviewed,  in  technology  areas  which  may  be 
impacted  eventually  by  the  research  being  reviewed,  and  in  systems 
and  operational  areas  which  may  be  impacted  in  the  future  by  the 
research  being  reviewed.  These  different  types  of  peers  are 
required  to  examine  the  different  facets  of  a  research  program 
which  could  have  impacts  far  beyond  the  specific  research  area 
being  reviewed. 

Research  Program  Peer  Review  Background 

Research  evaluation  methodologies  can  be  divided  generically 
into  three  groupings  [Kostoff,  1995c,  1996a]:  Qualitative  (e.g., 
peer  review);  Semi-Quantitative  (e.g.,  retrospective);  and 
Quantitative  (e.g. ,  bibliometric) .  Peer  review  of  research  is 
overwhelmingly  the  method  of  choice  in  practice  in  the  U.  S.,  as 
well  as  the  rest  of  the  world  [Salasin,  1980;  Logsdon,  1985; 
Chubin,  1990;  Chubin,  1994;  Kostoff,  1995a;  Stamps,  1997a]. 
Presently,  the  major  applications  of  research  peer  review  are,  in 
decreasing  usage  order:  journal  manuscript  submission  review; 
proposal  review;  project  and  program  review;  faculty  performance 
review;  and  dissertation  review. 

Most  of  the  peer  review  literature  focus  has  been  on 
manuscript  and  proposal  review.  For  example,  a  1993  literature 
survey  [Speck,  1993]  compiled  780  abstracts  of  papers  on  peer 
review,  of  which  643  papers  were  on  journal  peer  review.  According 
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to  Armstrong  [Armstrong,  1997],  101  of  these  provided  empirical 
evidence.  Relatively  few  studies  have  been  done  on  the  issues  and 
principles  underlying  project  or  program  review  and  reported  in  the 
open  literature.  This  conclusion,  complemented  by  Speck's  and 
Armstrong's  findings,  was  confirmed  most  graphically  by  a  recent 
peer  review  literature  survey  conducted  by  the  author.  Over  half 
the  documents  retrieved  were  either  letters  to  the  editors  of 
journals,  or  editorials  (or  their  equivalent) .  The  papers  on 
program  review  tended  to  be  reports  of  technical  and  statistical 
results  of  the  review,  with  little  or  no  focus  on  the  principles 
and  issues  underlying  the  peer  review  components.  Whatever  papers 
existed  on  peer  review  component  principles  related  to  manuscript 
reviews  (mainly)  or  proposal  reviews. 

Peer  reviews  of  research  programs,  when  done  at  all,  are  not 
nearly  as  consistent  across  the  research  sponsoring  organizations 
as  are  the  manuscript  and  proposal  reviews.  Program  reviews  tend 
to  range  from  very  informal  personal  discussions  to  tens  of  formal 
panel  reviews.  Most  of  the  people  who  conduct  program  reviews  do 
not  document  them  in  the  literature,  and  most  of  the  principle  and 
concept  papers  in  the  peer  review  literature  are  written  by  people 
who  have  never  conducted  a  research  program  peer  review. 
Consequently,  there  are  two  major  gaps  in  the  literature  on 
research  program  peer  review.  First,  there  are  quantitatively  few 
papers  published,  and  second,  most  of  the  concept  and  principle 
papers  that  do  exist  bear  little  relation  to  the  reality  of 
conducting  a  program  review. 

To  identify  and  address  some  of  these  gaps,  a  number  of  peer 
review  issues  will  be  examined  now.  These  issues  were  selected 
from  a  taxonomy  of  categories  generated  by  the  author's  recent  peer 
review  literature  survey,  as  well  as  from  previous  assessments  of 
problems  with  peer  review  and  other  research  evaluation  approaches 
[Kostoff ,  1996a] .  The  headings  of  the  topical  issues  addressed  in 
the  main  body  of  this  text  immediately  following  this  section 
include:  Objectives  and  Purposes  of  Peer  Review;  Quality  of  Peer 
Review;  Impact  of  Peer  Review  Manager  on  Quality;  Selection  of  Peer 
Reviewers;  Selection  of  Evaluation  Criteria;  Secrecy  (Reviewer  and 
Performer  Anonymity) ;  Objectivity/  Bias/  Fairness  of  Peer  Review; 
Normalization  of  Peer  Review  Panels;  Repeatability/  Reliability  of 
Peer  Review;  Effectiveness/  Predictability  of  Peer  Review;  Costs  of 
Performing  a  Peer  Review;  Ethical  Issues  in  Peer  Review; 
Alternatives  to  Peer  Review;  Recommendations  for  Further  Research 
in  Peer  Review. 

IV-A-2.  PEER  REVIEW  PRINCIPLES 

OBJECTIVES/  PURPOSE  OF  PEER  REVIEW 

Peer  review  supports  many  diverse  purposes.  It  serves  as  a 
quality  filter  to  conserve  resources:  papers  published  in  peer- 
reviewed  journals  are  assumed  to  be  above  a  minimal  quality 
threshold,  such  that  the  reader  can  focus  limited  time  resources  on 
the  highest  quality  documents  assumed  to  be  contained  in  these 
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journals;  projects  and  programs  selected  for  initiation  or 
continuation  by  peer  review  are  assumed  to  be  above  a  minimal 
quality  threshold,  and  precious  labor  and  hardware  resources  can  be 
focused  on  these  high  quality  tasks  selected.  Peer  review  has  the 
potential  to  add  value  to,  and  improve  the  quality  of,  the 
manuscript  or  program  under  review.  Peer  review  can  provide  an 
imprimatur  of  legitimacy  and  competency  to  increase  a  program's 
visibility  and  support.  The  objectives  of  peer  review  range  from 
being  an  efficient  resource  allocation  mechanism  to  a  credible 
predictor  of  research  impact.  A  properly  conducted  research 
program  peer  review  can  provide  credible  indication  to  the  research 
sponsors  of  program  quality,  program  relevance,  management  quality, 
and  appropriateness  of  direction  [Alassaf,  1996;  Armstrong,  1997; 
Cram,  1992;  Gabel,  1992;  GERMANY,  1988;  Kessler,  1992;  Levine, 
1988;  Palli,  1993;  Rainville,  1991;  Ramsay,  1989;  Stull,  1989; 
Wakefield,  1995;  Wicks,  1992]. 

The  literature  contains  some  quantitative  studies  which 
indicate  some  value  added  by  peer  review.  For  example,  recent 
studies  evaluated  the  effects  of  peer  review  and  editing  on 
manuscript  quality  [Goodman,  1994],  and  the  effects  of  peer  review 
and  editorial  processes  on  the  readability  of  original  articles 
[Roberts,  1994].  They  concluded  that  peer  review  and  editing 
improve  the  quality  of  medical  research  reporting,  as  well  as  the 
readability  of  original  articles  and  their  abstracts.  They  did  not 
address  whether  the  quality  of  the  research  was  improved,  nor  do 
other  literature  articles. 

From  the  author's  experience,  there  are  three  times  during  the 
research  program  peer  review  process  when  value  is  added.  First  is 
the  period  between  reviews,  when  the  researchers  do  their  work 
knowing  that  it  will  be  subject  to  high  quality  review.  The  value 
added  during  this  performance  phase  is  that  the  researchers  will 
maintain  a  higher  level  of  performance  quality  because  of  the 
knowledge  of  the  forthcoming  expert  review.  For  example, 
performers  will  be  less  inclined  to  work  on  their  theses  for 
decades  if  they  know  that  they  will  be  evaluated  periodically. 
Program  managers  will  be  more  likely  to  continually  update  the 
balance  and  relationships  among  their  component  projects,  rather 
than  allow  poor  performers  to  languish,  if  they  know  that  a  review 
is  forthcoming. 

The  analogy  is  to  a  well-known  speed  trap  on  a  highway.  The 
knowledge  that  a  stretch  of  road  is  well  policed  is  sufficient  to 
keep  the  average  speed  within  the  posted  limit.  The  fact  that  the 
officers  write  relatively  few  tickets  in  this  area  is  not  a  measure 
of  effectiveness  of  the  speed  trap.  It  would  be  useful  if  studies 
were  done  comparing  the  quality  of  research  of  periodically 
reviewed  programs  to  infrequently  ad  hoc  reviewed  programs  to  see 
if  this  value  added  component  is  experimentally  verifiable. 

Second  is  the  period  of  review  preparation,  particularly  the 
'dry  runs'  for  program  presentations.  This  is  an  extremely 
valuable  experience,  both  for  the  managers  and  the  researchers,  and 
would  by  itself  justify  the  cost  and  effort  of  the  total  review. 
Especially  for  research  program  peer  review,  the  preparation  period 
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provides  a  focal  point  for  discussion  of  unresolved  issues  and 
priorities,  and  fuels  substantive  discussions  in  order  to  arrive  at 
a  quality  presentation.  The  value  added  is  not  in  the  .superficial 
presentation  form  improvement,  but  in  the  substantive  increase  in 
the  intrinsic  program  quality. 

Third  is  the  actual  review.  Here,  independent  viewpoints  are 
injected  in  a  public  forum,  high  quality  research  is  re-affirmed, 
and  strong  recommendations  are  provided  for  the  fate  of  poor 
research. 

A  fourth  time  of  value  added  could  be  postulated  as  well, 
depending  on  the  review  results.  If  the  review  outcome  was  very 
favorable,  and  eventually  resulted  in  additional  program  funding, 
then  value  was  added,  at  least  to  the  funding  recipients  and 
hopefully  to  the  larger  society  as  well. 

Finally,  it  should  be  remembered  that  any  of  the  review 
processes  involve  real-time  iudaements  of  the  quality  of  research, 
not  expressions  of  the  intrinsic  quality  of  the  research.  The 
passage  of  time  is  required  to  follow  the  evolution  of  research  to 
ascertain  whether  it  achieves  its  promise.  How  well  these  peer 
review  judgements  relate  to  the  actual  impact  of  the  research  on 
science  and  technology  and  society  is  an  important  measure  of  long¬ 
term  peer  review  value,  and  is  addressed  to  some  extent  in  the 
later  section  on  Predictability. 

Another  taxonomy  of  the  potential  values  added  by  peer  review 
can  be  summarized  as  [Chvibin,  1994]: 

1.  an  effective  resource  allocation  mechanism; 

2.  an  efficient  resource  allocator; 

3.  a  promoter  of  science  accountability; 

4.  a  mechanism  for  policymakers  to  direct  scientific  effort; 

5.  a  rational  process; 

6.  a  fair  process; 

7.  a  valid  and  reliable  measure  of  scientific  performance. 

Much  of  the  remainder  of  the  main  body  of  this  section 
examines  the  intrinsic  and  arbitrary  roadblocks  to  achieving  these 
desirable  goals  in  a  research  program  peer  review.  Many  of  the 
negative  aspects  of  program  peer  review  will  be  addressed,  such  as 
potential  bias,  cost,  and  protection  of  the  status  quo.  The 
present  sub-section  concludes  by  examining  briefly  another 
potentially  negative  aspect  of  peer  review  not  addressed  by  the 
literature;  namely,  whether  the  knowledge  of  periodically  scheduled 
reviews  would  stifle  the  pursuit  and  presentation  of  very 
innovative  but  far-out  ideas.  Would  performers  be  reluctant  to 
present  these  ideas  in  a  public  forum,  where  the  credibility  of  the 
performers  could  be  challenged  for  these  ideas?  In  other  words, 
does  the  practice  of  peer  review,  and  especially  panel-based 
program  peer  review,  effectively  result  in  sel f-censorshio  of 
radical  ideas?  This  is  an  area  where  research  is  needed  to 
ascertain  whether  ideas  have  been  suppressed  in  periodically 
reviewed  programs,  and  then  to  determine  how  this  problem  could  be 
surmounted  if  it  exists. 
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QUALITY  OF  PEER  REVIEW 


The  studies  related  to  peer  review  which  have  been’ reported  in 
the  literature  range  from  the  mechanics  of  conducting  a  peer 
review,  to  examples  of  peer  reviews,  to  detailed  critiques  of  peer 
reviews  and  the  process  itself.  In  addition  to  descriptions  of 
peer  reviews  and  processes  contained  in  the  reviews  and  surveys 
referenced  above,  other  examples  of  processes  and  critiques  can  be 
found  in  [Armstrong,  1997;  Chubin,  1990;  Chubin,  1994;  Barker, 
1992;  Cicchetti,  1991;  Cole,  1981;  DOE,  1993;  Frazier,  1987; 
Kostoff ,  1995d] . 

While  the  reported  studies  of  peer  reviews  present  the  process 
mechanics,  the  procedures  followed,  and  the  review  results,  the 
reader  cannot  ascertain  the  cmality  of  the  findings  and 
recommendations  of  the  review.  In  practice,  procedure  and  process 
quality  are  mildly  necessary,  but  nowhere  sufficient,  conditions 
for  generating  a  high  quality  peer  review.  Many  useful  peer 
reviews  have  been  conducted  using  a  broad  variety  of  processes,  and 
while  well  documented  modern  processes  (e.g.,  [DOE,  1993])  may 
contribute  to  the  efficiency  of  conducting  a  review,  more  than 
process  is  needed  for  high  quality.  Many  intangible  factors  enter 
into  a  high  quality  review  [Evans,  1990;  Friedman,  1995;  Goodman, 
1994;  Lundberg,  1991;  Luukonnen-Grunow ,  1990;  McNutt,  1990; 
Vandenbroucke ,  1994],  and  some  of  the  more  important  factors  will 
be  discussed. 

The  underlying  hypothetical  postulate  of  this  section  is  that 
there  exists  an  intrinsic  quality  inherent  in  every  basic  research 
task.  By  definition,  a  high  quality  peer  review  should  provide  an 
accurate  picture  of  this  intrinsic  quality  of  the  research  being 
reviewed,  irrespective  of  whether  this  intrinsic  quality  is  high  or 
low.  The  fundamental  problem  is  that  there  are  no  absolute 
standards  for  the  measurement  of  research  quality,  analogous  to 
physical  standards  for  primary  measurements  such  as  time  and 
length.  Presently,  evaluation  of  intrinsic  research  cpaality  is  a 
subjective  process,  depending  on  the  perspectives  and  past 
experiences  of  the  reviewers.  A  high  quality  review  under  these 
imperfect  circumstances,  then,  would  be  defined  to  occur  when  two 
generic  conditions  are  fulfilled;  1)  utilization  of  highly 
competent  reviewers,  and  2)  no  injection  of  additional  distortions 
in  the  reviewers'  evaluations  as  a  result  of  biases,  conflict, 
fraud,  or  insufficient  work. 

High  quality  peer  review  processes  require  as  a  minimum  the 
conditions  summarized  from  Ormala  [Ormala,  1989]; 

1.  The  method,  organization  and  criteria  for  an  evaluation 
should  be  chosen  and  adjusted  to  the  particular  evaluation 
situation; 

2.  Different  evaluation  levels  require  different  evaluation 
methods ; 

3.  Program  and  project  goals  are  an  important  consideration 
when  an  evaluation  study  is  carried  out; 

4.  The  basic  motive  behind  an  evaluation  and  the  relationships 
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between  an  evaluation  and  decision  making  should  be  openly 
communicated  to  all  the  parties  involved; 

5.  The  aims  of  an  evaluation  should  be  explicitly  formulated; 

6.  The  credibility  of  an  evaluation  should  always  be  carefully 
established; 

7.  The  prerequisites  for  the  effective  utilization  of 
evaluation  results  should  be  taken  into  consideration  in  evaluation 
design. 

The  impact  of  a  peer  review  on  decisionmaking  is  considered  as 
a  measure  of  its  effectiveness,  not  its  quality.  Poorly  conducted 
peer  reviews  could  theoretically  have  major  influences  on 
decisions,  and  well  conducted  peer  reviews  could  have  minimal 
influence  on  decisionmaking.  It  is  important  to  separate  peer 
review  quality  from  effectiveness. 

A  corollary  aspect  of  peer  review  quality,  although  in  the 
author's  judgement  not  a  primary  contributor  to  nominal  research 
program  peer  review  quality,  is  the  commission  of  errors  by  the 
reviewers.  The  author  is  not  aware  of  published  studies  which  have 
examined  the  commission  of  errors  by  research  program  peer 
reviewers.  In  a  recent  paper  [Armstrong,  1997],  different  studies 
of  errors  and  superficial  work  by  peer  reviewers  of  journal 
manuscripts  are  described.  The  conclusion  one  draws  from  these 
results  is  that  the  problem  of  manuscript  reviewer  error  production 
is  not  insignificant.  Armstrong  does  make  the  point  that  journal 
manuscript  peer  reviewers  typically  receive  no  extrinsic  awards, 
are  typically  anonymous,  and  therefore  in  some  cases  may  not  feel 
motivated  to  exert  the  effort  required  for  a  high  quality  review. 

There  is  somewhat  of  an  imbalance  in  this  author-reviewer 
symbiosis,  since  the  journal  article  author  spends  hundreds  of 
hours  performing  the  work  and  is  required  to  place  his  reputation 
on  the  line  when  submitting  the  article  for  publication,  while  the 
reviewer  spends  relatively  few  hours  at  his  task  with  essentially 
little  chance  of  damage  to  his  reputation  for  mediocre  performance. 
The  legal  system  recognizes  the  existence  of  these  human  frailties, 
and  has  a  multi-level  hierarchical  appeals  system  established  to 
handle  possible  errors  by  judges  and  juries.  The  medical/  legal 
system  also  has  effectively  an  appeals  procedure  established  by  its 
malpractice  system.  Perhaps  the  science  profession  needs  the 
establishment  of  a  somewhat  more  formal  appeals  system  to  level  the 
playing  field  for  manuscript  authors  and  others  subject,  to  peer 
review,  and  to  insure  that  in  the  end  justice  will  be  served  and 
quality  will  be  maintained.  A  recent  paper  [Stamps,  1997b]  reviews 
the  literature  on  conflict  resolution,  and  describes  a  process 
(dialectical  scientific  brief)  for  resolving  disputes  from 
manuscript  peer  review  in  scientific  journals.  This,  or  some 
alternative,  procedure  could  be  modified  to  apply  to  other  types  of 
scientific  peer  review  as  well. 

In  most  research  program  peer  reviews,  commission  of  technical 
errors  by  reviewers  due  to  the  relaxed  standards  resulting  from 
anonymity  and  lack  of  financial  incentives  is  probably  not  nearly 
as  serious  as  in  manuscript  reviews.  While  a  small  fraction  of 
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program  reviews  may  be  carried  out  by  anonymous  mail  reviews  from 
experts  (if  this  is  done  at  all,  it  would  apply  when  the  program  is 
evaluated  by  reviewing  each  of  the  projects  separately) ,  the  vast 
majority  of  program  reviews  are  carried  out  with  the  use  of  expert 
panels.  In  some  cases,  the  panel  members  may  receive  modest 
compensation,  but  in  any  case,  they  are  no  longer  anonymous.  Their 
reputations  are  on  the  line  as  they  participate  in  these  panels. 
In  the  author's  experience,  panel  members  tend  to  suppress  overt 
expressions  of  biases,  and  they  typically  make  statements  they  are 
able  to  defend.  Whether  this  translates  into  more  conservatism 
relative  to  the  anonymous  journal  manuscript  reviews  depends  on  how 
the  review  process  is  structured,  and  is  discussed  in  more  detail 
later  in  the  section  on  Secrecy.  In  any  case,  studies  of  the 
extent  of  errors  committed  by  research  program  peer  reviewers 
remain  to  be  done,  and  if  these  panels  eventually  have  substantial 
input  to  the  budgetary  process,  then  some  sort  of  appeals  system 
for  program  reviews  may  have  to  be  established. 

IMPACT  OF  PEER  REVIEW  MANAGER  ON  QUALITY 

From  the  author's  perspective,  the  single  most  important 
factor  in  producing  a  high  quality  research  program  peer  review  is 
the  dedication  of  an  organization's  senior  management  to  the 
highest  quality  objective  review,  and  the  associated  emplacement  of 
rewards  and  incentives  to  encourage  such  reviews.  The  second  most 
important  factor  in  producing  a  high  quality  review,  and  in  fact 
the  cornerstone  of  a  successful  review,  is  the  motivation  of  the 
person  managing  the  review  to  conduct  a  technically  credible 
review.  This  review  leader  selects  and  manages  the  review  process, 
selects  the  review  criteria,  selects  the  reviewers,  guides  the 
questions  and  discussions  in  a  panel  review,  summarizes  the 
reviewers'  comments  in  a  mail  or  panel  review,  and  makes 
recommendations  about  whether  a  program  should  be  initiated, 
continued,  or  modified. 

The  direction  of  the  assessment  may  be  heavily  influenced  if 
conscious  or  subconscious  biases  of  the  review  leader  are  exerted, 
especially  during  the  reviewer  selection  process.  In  an  extreme 
case  of  bias,  the  review's  results  could  be  determined  completely 
by  the  reviewer  selection  before  the  reviewers  ever  meet.  This 
conclusion  is  valid  for  the  manager  of  a  program  or  project  review, 
the  manager  of  a  proposal  review,  or  the  editor  in  charge  of  a 
journal  manuscript  review.  The  author  is  not  aware  of  any  of  these 
types  of  reviews  where  the  reviewers  are  selected  by  a  random 
process,  which  would  eliminate  much  of  the  selection  bias.  Because 
of  this  potential  intrinsic  bias  due  to  the  conscious  reviewer 
selection  by  the  review  manager,  unless  random  reviewer  selection 
is  operable  in  conducting  a  review,  any  mathematical  correlations 
[e.g. ,  Cicchetti,  1991]  between  reviewers'  scores  and  review 
outcomes  (illuminating  and  insightful  though  they  may  be)  must  be 
opened  to  question. 


SELECTION  OF  PEER  REVIEWERS 
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Even  with  the  strongest  support  from  an  organization's  top 
management,  and  the  direction  of  an  unbiased  and  competent  review 
leader,  the  quality  of  a  review  will  never  go  beyond  the  competence 
of  the  reviewers.  Two  dimensions  of  competence  which  should  be 
considered  for  a  research  review  are  the  individual  reviewer's 
technical  competence  for  the  subject  area,  and  the  competence  of 
the  review  group  as  a  body  to  cover  the  different  facets  of 
research  issues  (other  research  impacts,  technology  and  mission 
considerations  and  impacts,  infrastructure,  political  and  social 
impacts)  [Kostoff,  1995d,  1996a;  Garson,  1980;  Klahr,  1985; 
Marshall,  1996].  The  quality  of  a  review  is  limited  by  the  biases 
and  conflicts  of  the  reviewers.  The  biases  and  conflicts  of  the 
reviewers  selected  should  be  known  to  the  leader  and  to  each  other. 

One  common  error  in  panel  selection  is  limiting  the  choice  of 
research  experts  to  those  who  have  specific  expertise  in  the 
subdisciplines  of  the  existing  program.  This  provides  an  answer  to 
the  question  of  whether  the  iob  is  being  done  right,  but  not  to 
whether  the  right  iob  is  being  done.  The  former  question  relates 
to  detailed  technical  quality,  while  the  latter  question  relates 
more  to  investment  strategy  in  the  broadest  sense  ( investment 
strategy  is  the  rationale  for  the  prioritization  and  allocation  of 
resources  among  the  program  components) .  To  answer  the  latter 
question,  people  with  broad  expertise  in  the  area  covered  by  the 
overall  program's  highest  level  objectives  should  also  be  selected. 
They  would  be  able  to  address  the  investment  strategy  more 
objectively,  and  determine  whether  the  mix  of  subdisciplines,  and 
the  allocation  of  resources  among  the  subdisciplines,  is 
appropriate.  The  review  group,  then,  would  be  able  to  address  the 
central  question  of  whether  the  right  iob  is  being  done  right. 

One  of  the  major  criticisms  of  peer  review,  whether 
manuscript,  proposal,  or  program,  is  that  it  tends  to  perpetuate 
orthodox  and  conservative  paradigms,  and  tends  to  reject  new 
paradigms  which  threaten  the  structure  of  the  status  quo.  If  one 
of  the  objectives  of  a  research  program  peer  review  is  in  fact  to 
ensure  that  innovation  is  recognized,  that  truly  revolutionary 
research  with  attendant  new  paradigms  will  be  promoted  and 
rewarded,  then  this  selection  of  reviewers  to  address  the  right  iob 
issue  in  parallel  with  reviewers  to  address  the  i ob  right  issue 
becomes  of  paramount  importance. 

One  of  the  most  severe  deficiencies  of  many  present  research 
program  peer  reviews  is  the  concentration  of  panel  experts  on  the 
issue  of  doing  the  iob  right  and  the  effective  absence  of  experts 
on  doing  the  right  iob.  This  can  lead  to  the  situation  which  the 
author  has  termed  "The  Pied  Piper  Effect"  [Kostoff,  1997b] .  This 
phenomenon  was  defined  initially  for  the  specific  case  of 
interpretation  of  journal  paper  citations,  but  it  is  applicable  to 
any  conclusion  resulting  from  anv  type  of  peer  review  as  well: 
journal,  proposal,  program.  Its  initial  bibliometric  definition, 
and  then  extrapolation  to  program  peer  review,  follows. 

One  of  the  main  concerns  with  using  citations  as  a  stand-alone 
measure  of  quality  and  impact  has  been  the  potential  bimodal 
interpretation  of  the  numerical  results.  A  paper  could  receive 
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high  citations  because  of  its  high  quality,  or  because  the  citers 
disagree  with  it.  However,  there  is  a  third  interpretation  which 
may  be  the  most  insidious,  and  further  precludes  citations  being 
utilized  in  stand-alone  mode,  the  "Pied  Piper”  effect. 

Assume  there  is  a  present-day  mainstream  approach  in  a 
specific  field  of  research;  for  example,  the  chemical/  radiation/ 
surgical  approach  to  treating  cancer  (See  [Kostoff,  1997b]  for  a 
more  detailed  example  of  the  "Pied  Piper  Effect") .  Assume  the 
following  hypothetical  scenario:  there  exist  alternative  approaches 
to  treatment  not  supported  by  the  mainstream  community;  in  fifty 
years  a  cure  for  cancer  is  discovered;  the  curative  approach  has 
nothing  to  do  with  today's  mainstream  research,  but  is  perhaps  a 
downstream  derivative  of  today's  alternative  methods;  it  turns  out 
that  today's  mainstream  approach  sanctioned  by  the  mainstream 
medical  community  was  completely  orthogonal  or  even  antithetical  to 
the  curative  approach.  Then  what  meaning  can  be  ascribed  to 
research  papers  in  cancer  today  which  are  highly  cited  for 
supposedly  positive  reasons? 

In  this  case,  a  paper's  high  citations  are  a  measure  of  the 
extent  to  which  the  paper ' s  author  has  persuaded  the  research 
community  that  the  research  direction  contained  in  his  paper  is  the 
correct  one,  and  not  a  measure  of  the  intrinsic  correctness  of  the 
research  direction.  It  is  analogous  to  firing  a  missile  accurately 
at  the  wrong  target.  In  fact,  the  high  citations  may  reflect  the 
deliberate  desire  of  a  closed  research  community  (the  author  and 
the  citers)  to  persuade  a  larger  community  (which  could  include 
politicians  and  other  resource  allocators)  that  the  research 
direction  is  the  correct  one. 

This  is  the  "Pied  Piper"  effect.  The  large  number  of 
citations  in  the  above  hypothetical  medical  example  becomes  a 
measure  of  the  extent  of  the  problem,  the  extent  of  the  diversion 
from  the  correct  path,  not  the  extent  of  progress  toward  the 
solution.  The  "Pied  Piper"  effect  is  a  key  reason  why,  especially 
in  the  case  of  revolutionary  research,  citations  and  other 
quantitative  measures  must  be  part  of  and  subordinate  to  a  broadly 
constituted  peer  review  in  any  credible  evaluation  and  assessment 
of  research  impact  and  quality. 

The  extrapolation  of  the  "Pied  Piper  Effect"  to  research 
program  peer  review  becomes  obvious.  Many  technical  communities 
are  comfortable  with  the  status  quo,  have  large  personal  and 
infrastructure  investments  in  the  mainline  orthodox  approaches,  and 
feel  threatened  by  new  paradigms  which  could  render  their 
investments  obsolete.  If  the  peer  reviewers  represent  only  the 
community  of  the  specific  research  approach  being  reviewed,  then 
the  debate  will  typically  center  around  the  correctness  of  the 
miniscule  details  of  the  approach  (job  right)  rather  than  whether 
the  approach  should  be  used  at  all  fright  iob> .  The  net  effect  of 
such  a  limited  review  is  to  provide  a  stamp  of  approval  (analogous 
to  the  high  citation  rates  described  above)  to  continuance  of  the 
mainline  approach,  and  to  close  the  door  to  revolutionary  thinking. 

Attachment  6  describes  a  method  for  selecting  peer  reviewers 
which  approximates  the  best  practices  in  use  today.  While  it  is 
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not  a  pure  random  selection  process ^  it  does  remove  much  of  the 
bias  of  present  selection  practices,  and  would  be  appropriate  for 
the  large  scale  program  peer  reviews  discussed  here.  • 

SELECTION  OF  EVALUATION  CRITERIA 

Research  evaluation  criteria  are  one  instrument  through  which 
an  organization  promulgates  strategic  and  policy  research 
objectives.  Detailed  responses  to  the  criteria  by  reviewers  are 
valuable  as  inputs  for  downstream  decisionmaking.  When  documented, 
review  criteria  also  serve  as  tangible  indicators  to  external 
groups  that  strategic  objectives  are  being  implemented  [Delcomyn, 
1991;  Eibeck,  1996;  Kellie,  1991;  Martin,  1981;  Sutherland,  1993; 
Weinberg,  1964,  1989]. 

Individual  criteria  can  be  viewed  mathematically  as  the 
components  of  a  vector.  The  complete  vector,  or  figure  of  merit  of 
the  review,  can  then  be  constructed  as  the  weighted  sum  of  the 
scores  of  its  components.  For  example,  assume  two  criteria. 
Research  Merit  (RM)  and  Mission  Relevance  (MR) ,  are  generated  by 
the  evaluating  organization  to  be  used  by  reviewers  for  research 
program  evaluation.  Assume  each  criterion  is  weighted  equally  by 
the  evaluating  organization.  Then,  in  the  absence  of  further 
constraints,  the  final  figure  of  merit,  overall  program  quality 
(OPQ) ,  is  computed  as  OPQ=. 5*RM+. 5*MR. 

Problems  arise,  however,  because  the  stated  criteria  are 
seldom  the  only  criteria  considered  important  by  the  reviewers.  In 
the  case  above,  the  evaluating  organization  selected  only  two 
criteria  which  it  feels  are  important  and  which  it  wants  the 
reviewers  to  address.  It  also  selected  the  weighting  to  be 
assigned  to  each  criterion,  and  the  figure  of  merit  algorithm. 
Conflict  arises  because  each  reviewer  has  his  or  her  own  view  of 
what  criteria  are  important  for  evaluating  research,  how  these 
criteria  should  be  weighted  for  a  particular  program,  and  how  they 
should  be  integrated  for  a  final  figure  of  merit.  In  the  author's 
experience  covering  hundreds  of  different  types  of  peer  reviews, 
evaluators  actually  conceive  a  gestalt,  or  view  of  the  integrated 
nature,  of  the  total  reseach  package  when  performing  the 
evaluation.  The  component  criteria  serve  to  stimulate  reviewers' 
thinking  in  specific  areas,  and  insure  that  the  reviewers  include 
issues  deemed  critical  to  the  review  managers. 

In  the  example  case,  there  is  the  potential  for,  serious 
mismatch  between  the  final  figure  of  merit  vector  obtained  by  the 
organization's  algorithm  and  by  the  reviewers'  mental  algorithm. 
The  two  vectors  could  be  sufficiently  different  that  one  could 
completely  misrepresent  the  other.  For  example,  assume  the 
organization  provided  the  algorithm  above  to  the  reviewers,  and 
also  assume  that  the  definition  of  Research  Merit  (importance  of 
the  problem  to  science)  did  not  include  Research  Approach  (approach 
taken  to  solve  the  prolslem)  .  Assume  the  reviewers  felt  that  the  RM 
and  MR  were  high  quality  for  a  program  being  reviewed.  However, 
assume  that  the  reviewers  felt  the  Research  Approach  taken  was 
extremely  poor  in  the  program  under  review,  and  that  Research 
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Approach  was  the  most  important  criterion  in  deciding  the  overall 
value  of  this  particular  research  program.  In  this  case,  use  of 
the  organization's  criteria  and  algorithm  will  provide  a  conclusion 
orthogonal  to  that  desired  by  the  reviewers.  Even  if  the 
organization  provides  the  additional  flexibility  of  allowing  the 
reviewers  to  provide  their  own  weighting  to  the  criteria,  in  the 
example  shown  the  reviewers '  desired  conclusion  will  still  be 
orthogonal  to  that  obtained  using  the  organization's  algorithm  with 
criteria  of  arbitrary  weighting. 

The  author  has  found  that  expert  reviewers  are  usually 
individuals  of  integrity,  and  the  way  they  resolve  the  above 
dilemma  is  through  the  principle  of  compromise  rather  than  the 
compromise  of  principles.  Operationally,  the  reviewers  develop  an 
intuitive  judgement  of  the  worth  of  the  total  research  package 
under  review,  then  'reverse-engineer'  the  weighting  and  scoring  of 
the  criteria  sub-consciously  (if  not  consciously)  until  the 
evaluation  algorithm  comes  closest  to  their  desired  intuitive 
overall  result. 

Based  on  these  observations,  the  author  recommends  (and  uses) 
inclusion  of  an  overall  project/program  quality  criterion  as  well. 
This  'bottom-line'  score  makes  clear  the  reviewers'  judgements 
about  the  total  research  package  presented,  and  incorporates  the 
effects  of  any  unstated  criteria  (e.g.,  organizational 
appropriateness)  which  a  reviewer  feels  are  important  determinants 
of  overall  research  quality.  This  approach  reduces  the  necessity 
for  'reverse  engineering'  to  arrive  at  displaying  the  reviewers' 
deepest  convictions.  If  the  evaluating  organization  still  wants  to 
use  only  its  own  criteria  to  arrive  at  the  final  figure  of  merit, 
then,  by  comparing  the  reviewers'  vector  and  the  organizational 
algorithmic  vector,  the  organization  can  identify  the  trade-off  in 
reviewer-perceived  quality  which  resulted  from  ignoring  reviewer¬ 
relevant  criteria. 

The  later  section  in  this  paper  on  agency  peer  review 
practices  discusses  the  more  detailed  studies  performed  by  the 
author  and  others  on  selection  and  importance  of  research  program 
evaluation  criteria.  In  general,  these  studies  show  that  the  most 
influential  criteria  relative  to  a  reviewer's  final  evaluation 
rating  are  research  merit,  research  approach,  and  performer 
quality.  In  addition,  a  relevance  criterion  is  important  in 
mission  agencies.  Nearer-term  relevance,  such  as  transition  to 
technology  (or  utility) ,  tends  to  be  more  influential  on  a 
reviewer's  final  overall  rating  than  longer-term  relevance  to  the 
sponsor's  downstream  mission.  Finally,  as  stated  above,  inclusion 
of  a  single  'bottom-line'  criterion  is  crucial. 

SECRECY:  REVIEWER  AND  PERFORMER  ANONYMITY 

The  issue  of  reviewer  anonymity  was  discussed  briefly  in  the 
section  on  Quality,  with  the  conclusion  that  detailed  technical 
quality  of  the  reviewer's  product  was  not  helped  by  the  anonymity. 
From  the  author's  viewpoint,  this  negative  aspect  pales  compared  to 
the  benefits  resulting  from  reviewer  anonymity,  although  there  is 
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not  a  unanimity  of  opinion  on  this  conclusion  in  the  literature 
[Altura,  1990;  Berezin,  1994;  Clayson,  1995;  Debakey,  1990;  Frei, 
1993;  Gresty,  1995;  Knox,  1981;  Neetens,  1995]. 

What  is  really  desired  from  a  peer  reviewer  is  an  honest 
viewpoint  on  the  intrinsic  quality  of  research  under  review, 
supported  by  rigorous  technical  analysis  where  possible.  Having 
the  reviewer  and  reviewee  present  during  the  review  (and  this 
applies  to  manuscript,  proposal,  and  program  review;  'present'  just 
must  be  interpreted  differently  in  each  case)  will  sharpen  the 
quality  of  the  technical  discussion  details,  and  eliminate  many  of 
the  types  of  errors  reported  in  the  studies  [Armstrong,  1997] 
discussed  earlier  in  the  Quality  section. 

However,  having  the  reviewer  and  reviewee  present  during  the 
review  will,  in  many  cases,  obviate  the  expression  of  the 
reviewer's  deepest  convictions  about  the  c[uality  of  the  research. 
Rewards  are  few  for  making  strong  negative  statements  about  a 
research  paper,  proposal,  or  program,  and  resulting  retributions 
and  resentments  may  far  outweigh  the  intrinsic  benefits  of  honest 
and  forthright  judgement  statements.  In  a  research  program  peer 
review  in  particular,  the  situation  is  more  complex  than  a 
manuscript  peer  review.  In  program  review,  it  is  the  program 
manager  who,  in  a  real  sense,  is  being  reviewed,  as  well  as  the 
research.  If  the  reviewers  are  'bench-level'  experts  in  the  field 
of  the  manager's  research  program,  as  one  assumes  they  typically 
are,  and  at  some  point  in  the  future  would  have  an  interest  in 
participating  in  the  manager's  specific  research  program,  then 
forthright  but  negative  reviews  could  have  potentially  serious 
consequences  on  their  ability  to  obtain  future  funding  from  the 
program  manager.  Finding  true  peers  to  serve  as  research  program 
reviewers  in  this  case  may  be  extremely  difficult,  and  requires 
judicious  care  in  the  selection  process. 

The  author  has  conducted  program/  proposal  reviews  which  span 
the  gamut  from  complete  reviewer  anonymity  to  complete  reviewer 
presence  with  reviewee  and  audience.  In  the  author's  experience, 
there  is  a  hierarchy  of  levels  of  reviewer  anonymity  which  produce 
different  degrees  of  frankness  and  honesty  in  the  reviewer's 
response. 

The  most  honest  and  straightforward  reviewer's  opinions  result 
from  phone  reviews  where  the  reviewer  is  completely  anonymous  to 
the  reviewee.  In  this  case,  the  reviewer  has  been  provided 
information  about  the  research  (typically  written)  and  provides 
feedback  orally  over  the  phone.  The  frankness  of  response  is  most 
evident  in  evaluating  the  right  job  function,  where  the  integrity 
of  the  total  research  approach  is  at  stake.  Reviewers  are  less 
reluctant  to  be  more  open  when  critiquing  the  job  right  function, 
since  major  direction  and  infrastructure  changes  will  not  be  at 
risk,  and  the  reviewee 's  defenses  will  not  be  as  vociferous. 

Next  in  the  hierarchy  are  written  reviews  where  the  reviewer 
is  completely  anonymous  to  the  reviewee.  Some  reviewers  will  tend 
to  moderate  the  frankness  of  their  comments  when  asked  to  provide 
them  in  writing.  However,  if  the  reviewers  trust  the  review 
manager  to  protect  their  anonymity,  they  will  still  be  quite  frank 
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in  their  writeups. 

The  next  level  of  anonymity  occurs  when  the  reviewers  and 
reviewees  are  both  present  during  the  research  presentations,  but 
the  reviewers  meet  in  closed  session  to  provide  oral  and  written 
evaluations  of  the  research,  with  these  evaluations  not  for 
attribution.  Even  the  presence  of  the  anonymity  during  the  closed 
session  will  provide  much  frank  discussion  and  exchange  of 
heartfelt  opinion. 

The  final  level  is  the  absence  of  anonymity,  where  both 
reviewers  and  reviewees  are  present  throughout  the  total  process, 
and  all  verbal  and  written  comments  are  provided  with  full 
attribution.  While  it  may  be  argued  that  this  type  of  review  is 
better  than  having  no  review,  from  the  author's  experience  this 
approach  does  not  begin  to  utilize  the  full  potential  of  what 
expert  peer  review  can  offer. 

The  other  side  of  the  secrecy  coin  is  witholding  the 
reviewee's  name  and  affiliation  from  the  reviewer.  This  process 
has  been  termed  "blind  reviewing"  [Blank,  1991;  Ceci,  1984;  Cox, 
1993;  Evans,  1990;  Fisher,  1994;  Johnson,  1995;  Laband,  1994; 
McNutt,  1990;  Nylenna,  1994;  Rosenblatt,  1980;  Shaughnessy,  1988; 
Sly,  1990].  Its  objectives  are  to  provide  fairer  reviews  of  work 
by  unknown  researchers  or  by  researchers  from  less  prestigious 
institutions  [Armstrong,  1997],  or  conceiveably  to  eliminate  bias 
based  on  personal  characteristics  such  as  gender.  Blind  reviewing 
(and  its  corollary  "double-blind"  reviewing,  when  both  the  reviewer 
and  reviewee  are  anonymous  to  each  other)  is  probably  most 
applicable  to  manuscript  review.  Some  studies  of  blind  reviewing 
for  journal  manuscripts  have  been  reported  [Fletcher  and  Fletcher, 
1997;  Fisher,  1994;  Laband,  1994].  Reviews  by  blinded  reviewers 
were  judged  by  the  editors  to  have  higher  guality;  the  blinded 
reviewers  gave  better  scores  to  authors  with  more  previous 
articles,  and  articles  published  in  journals  using  blinded  peer 
review  were  cited  significantly  more  than  articles  published  in 
journals  using  non-blinded  peer  review. 

Unfortunately,  removing  the  identity  of  the  reviewee  from  the 
research  under  review  is  akin  to  solving  an  equation  after 
eliminating  the  dominant  term.  The  DOE  peer  review  study  of  the 
quality  of  its  Office  of  Basic  Energy  Sciences'  research  program 
[DOE,  1982],  which  is  probably  the  classic  study  of  research 
program  quality  using  a  statistical  sampling  of  component  project 
quality,  concluded  that  team  quality  was  the  most  important 
variable  in  determining  overall  project  quality.  Based  on  these, 
and  other  similar  results,  evaluating  proposals  without  reviewee 
identity  could  provide  misleading  results.  There  are  many  good 
proposed  research  topics  in  existence.  The  high  quality  researcher 
will  develop  a  track  record  of  not  only  addressing  good  research 
topics,  but  through  perseverance  and  critical  thought  will  make 
substantial  progress  toward  solutions.  Today,  there  exist  many 
consulting  firms  that  will  assist  researchers  in  preparing  funding 
proposals.  These  consultants  are  very  aware  of  the  appropriate 
'buzzwords'  and  politically  correct  terminology,  and  what  type  of 
formatting  and  proposal  organizational  structure  will  appeal  most 
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to  decision  makers.  Judging  such  proposals  independent  of  the 
researcher  will  eventually  allow  form  to  predominate  over 
substance. 

In  any  case,  blind  reviews  probably  have  minimal  applicability 
to  research  program  reviews.  In  most  cases,  panel  reviews  are  used, 
and  extraordinary  precautions  would  have  to  be  taken  to  protect  the 
identity  of  the  reviewees.  Coupled  with  the  inability  to  use  the 
team  quality  criterion,  there  appears  to  be  little  motivation  to 
employ  this  process  in  program  peer  review.  There  appears  to  be 
nothing  on  this  topic  related  to  program  review  in  the  literature. 

OBJECTIVITY/  BIAS/  FAIRNESS  OF  PEER  REVIEW 

Probably  the  most  criticized  aspect  of  all  types  of  peer 
reyiew  is  the  role  of  bias,  and  its  subsequent  impact  on  fairness, 
in  the  final  recommendations  of  the  reviewers.  Peer  reviews  have 
received  written  and  verbal  accusations  of  having  gender  bias,  race 
bias,  institutional  bias,  geographic  bias,  age  bias,  and  especially 
a  conservative  bias  toward  protecting  the  'old  boy's'  network  of 
the  status  quo.  Much  research  effort  has  been  focused  on  this 
issue  of  bias  and  fairness  [Armstrong,  1982,  1997;  Bailar,  1991; 
Daniel,  1993;  Ehlen,  1996;  Ernst,  1994;  Ramasarma,  1995;  Spitzer, 
1994];  Armstrong  [Armstrong,  1997]  makes  the  point  that  almost  half 
of  the  empirical  papers  on  journal  reviewing  in  a  recent  massive 
study  [Speck,  1993]  address  these  issues. 

The  findings  are  mixed.  A  recent  study  [Gilbert,  1994] 
assessed  whether  manuscripts  received  by  the  JAMA  possessed 
differing  peer  review  and  manuscript  processing  characteristics,  or 
had  a  variable  chance  of  acceptance,  associated  with  the  gender  of 
the  participants  in  the  peer  review  process.  The  study  concluded 
that  gender  differences  exist  in  editor  and  reviewer 
characteristics  at  JAMA  with  no  apparent  effect  on  the  final 
outcome  of  the  peer  review  process  or  acceptance  for  publication. 

Another  study  [Peters,  1982]  found  that  reviewers  were  biased 
against  authors  from  unknown  or  less-prestigious  institutions.  A 
study  in  which  NSF  proposal  reviews  were  re-evaluated  by  a 
different  panel  [Cole,  1981]  included  institutional  reputation, 
professional  age,  academic  rank,  geographic  location,  and  other 
variables.  It  concluded  that  the  peer  review  system  employed  by 
NSF  was  essentially  free  of  systematic  bias.  A  study  of  the  DOE 
Office  of  Basic  Energy  Sciences  [DOE,  1982]  stated  that  the 
conclusions  concerning  the  laboratory  and  non-laboratory  projects 
were  not  distorted  by  reviewer  biases. 

A  1992  report  elaborates  on  the  concerns  of  bias  and  conflict 
in  a  section  describing  guidelines  on  a  common  framework  for 
organizing  Federal  investments  [NAS,  1992].  Its  Principle  6 
(Program  Evaluation)  contains  the  statement:  "Current  efforts  to 
review  government  R&D  programs  have  suffered,  in  some  instances, 
from  the  fact  that  annual  reports  to  Congress  or  the  executive 
branch  have  been  conducted  by  mission  agency  employees  with  a 
direct  interest  in  having  projects  they  evaluate  continue. 
Technical  evaluations  of  the  R&D  work  and  of  the  contributions  to 
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national  economic  welfare  of  pre-commercial  R&D  programs  should  be 
conducted  by  nongovernmental  groups  that  do  not  have  a  direct  role 
in  program  management  or  funding  decisions”. 

The  underlying  paradigm  of  the  bias/  fairness  issue  is  that 
all  reviewees  should  be  treated  the  same;  there  should  be  a  level 
playing  field  for  all  players.  Unfortunately,  in  the 
implementation  of  this  noble  philosophy,  the  rules  of  scientific 
evidence  take  second  priority  to  the  rules  of  political 
correctness.  This  motivation  toward  perceived  increased  fairness 
is  probably  the  main  driver  for  peer  review  concepts  such  as  'blind 
reviewing ' ,  which  was  addressed  in  the  previous  section  of  this 
paper  on  Secrecy.  It  was  concluded  that  the  downside  to  "blind 
reviewing"  was  the  elimination  of  the  key  reviewer  criterion  of 
track  record  (team  quality)  and  the  subsequent  degradation  of  the 
review  process  quality. 

However,  assigning  overwhelming  importance  to  track  record,  as 
proposed  by  some  researchers  in  the  later  Alternatives  section  of 
this  paper,  shifts  the  functional  balance  toward  emphasizing  the 
i  ob  right  aspect  of  the  research  as  opposed  to  the  right  i  ob 
aspect,  and  is  in  many  respects  a  double-edged  sword.  It  presents 
serious  obstacles  for  young  researchers  with  little  track  record 
who  may  have  very  good  ideas  for  solving  difficult  research 
problems  and  may  be  very  capable  of  addressing  these  problems,  and 
has  the  potential  for  maintaining  the  'old  boy's'  network  and  the 
status  quo.  This  can  have  very  serious  consequences,  as  the 
discussion  of  the  "Pied  Piper  Effect"  showed  in  the  previous 
section.  The  solution  to  this  paradox  is  not  to  eliminate  the  key 
variable  of  researcher  identity,  but  rather  to  select  reviewers 
such  that  the  perspective  of  the  panel  is  broadened.  Use  panelists 
who  are  able  to  address  the  right  -j  ob  aspects  of  the  research 
target,  to  insure  that  outmoded  but  prolific  and  well-cited 
research  is  not  promulgated  in  perpetuity,  and  that  the  pool  of 
expertise  is  being  continually  refilled. 

\ 

NORMALIZATION  OF  PEER  REVIEW  PANELS 

Peer  review  is  a  diagnostic  process  which  can  be  applied  in 
isolation  on  a  body  of  research,  or  can  be  used  for  comparing  many 
different  types  of  research.  When  applied  for  comparative 
purposes,  a  key  issue  centers  around  how  the  results  of  different 
panels  evaluating  different  technical  disciplines  can  be  normalized 
such  that  comparisons  across  disciplines  and  panels  become 
meaningful.  How,  for  example,  can  the  differences  in  intrinsic 
quality  of  the  different  types  of  research  being  reviewed  be 
separated  from  different  panel  biases,  different  panel 
interpretations  of  criteria,  different  severities  of  panelists  in 
applying  the  criteria,  when  only  scores  and  comments  which  include 
all  these  factors  are  presented.  This  normalization  issue  is 
perhaps  the  most  difficult  aspect  of  peer  review,  and  normalization 
difficulty  also  applies  to  other  aspects  of  research  evaluation 
such  as  bibliometrics  [Braun,  1982;  Kostoff,  1997m;  Schubert, 
1996] . 
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Most  studies  which  examine  peer  reviews  across  disciplines 
present  the  results  for  the  major  discipline  categories  separately 
[e.g.,  DOE,  1982;  Cicchetti,  1991;  Cole,  1981].  They . essentially 
finesse  the  problem.  While  this  separation  of  categories  is  valid 
when  research  is  viewed  from  a  strategic  viewpoint,  where 
disciplines  are  selected  and  maintained  for  their  importance  to  an 
organization's  mission,  this  discipline  separation  reduces  the 
value  of  peer  review  as  a  quality  comparative  yardstick 
considerably.  Quantitative  evaluation  approaches,  such  as 
bibliometrics,  develop  reference  standards  for  different 
disciplines  and  then  construct  appropriate  scaling  procedures  for 
ranking  the  research  [Schubert,  1996] .  This  does  allow  for 
comparison  of  relative  rankings  across  disciplines  in  a  broad 
generic  sense,  but  questions  arise  as  to  the  applicability  of 
reference  standards  defined  for  a  discipline  (e.g.,  acoustics)  to 
programs  being  compared  within  the  discipline  (e.g.,  underwater 
acoustics  vs  aeroacoustics) . 

The  author  has  not  seen  any  fully  satisfactory  peer  review 
normalization  approaches  due  to  the  presence  of  the  many  variables 
listed  previously.  However,  one  interesting  normalization  approach 
is  used  by  the  Dutch  STW  for  evaluating  research  proposals  [Van  den 
Beemt,  1991,  1997].  Technical  comments,  but  not  quality  ratings, 
are  provided  by  technical  peers.  The  comments,  and  proposer 
responses,  for  twenty  different  proposals  are  then  provided  to 
twelve  people  from  a  variety  of  disciplines.  This  'jury'  of  twelve 
provides  the  scores  through  an  independent  mail  review. 
Essentially,  the  normalization  is  provided  by  having  the  twelve 
jurors  common  to  all  proposals. 

The  author  has  used  two  approaches  to  improve  normialization 
across  panels  somewhat.  First  is  the  utilization  of  some 
individuals  common  to  all  panels.  In  a  series  of  competitions  for 
new  accelerated  research  programs  that  was  held  in  the  late  19*8 Os 
[Kostoff,  1988],  the  author  served  as  chairman  of  all  the  different 
discipline  panels.  This  resulted  in  some  small  measure  of 
normalization  among  the  different  panels.  Use  of  more  individuals 
common  to  all  panels  would  have  provided  an  extra  measure  of 
normalization,  and  in  this  sense  the  presence  of  senior  management 
during  the  reviews  provided  additional  measures  of  normalization. 
Obviously,  the  more  closely  the  panels  are  related  topically,  the 
more  valuable  is  the  technical  contribution  of  individuals  common 
to  the  different  panels. 

Second,  it  was  assumed  that  the  difference  in  aggregated 
average  scores  for  major  disciplines  (e.g.,  physical  sciences  and 
life  sciences)  was  due  to  two  factors;  differences  in  intrinsic 
quality  of  the  programs  proposed  and  differences  in  the  scoring 
severity  of  the  reviewers.  To  normalize,  a  fraction  of  the 
differences  in  aggregated  average  scores  for  the  major  disciplines 
was  removed.  This  was  assumed  to  eliminate  the  scoring  severity 
difference.  Trial  and  error  showed  a  fifty  percent  correction 
factor  provided  results  which  appeared  intuitively  reasonable  to 
the  relevant  audience  members  who  had  attended  all  the  reviews. 
This  normalization  procedure  had  the  added  benefit  of  preserving 
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and  insuring  representation  from  disciplines  which  had  strategic 
value  to  the  organization. 

This  approach  to  normalization  could  have-  a  second 
interpretation.  If  the  research  is  viewed  as  having  a  strategic 
component  and  a  quality  component,  with  the  reviewers'  scores 
viewed  as  addressing  the  quality  component  only,  then  the 
correction  could  be  perceived  as  adjusting  for  the  presence  of  the 
strategic  component.  For  example,  assume  a  Life  Sciences  panel 
produced  an  average  program  score  of  five,  and  an  Engineering 
Sciences  panel  produced  an  average  score  of  ten.  Assume  further 
that  each  discipline  had  equal  strategic  value  to  the  organization, 
and  that  the  strategic  value  was  of  equal  importance  to  the 
reviewers'  scores  (assumed  to  be  a  total  program  quality  score 
which  includes  mission  relevance) .  Then  the  normalized  total  score 
can  be  computed  as  FOM  =  0.5*STRAT  +  0.5*SCORE,  and  the  difference 
between  the  two  panels'  scores  would  be  reduced  from  five  to  2.5. 

If  peer  review  is  eventually  used  to  support  GPRA,  then  some 
sort  of  normalization  procedure  will  be  required  for  credibility. 
Given  the  very  limited  validity  of  existing  schemes  for 
normalization,  especially  across  disparate  disciplines,  this  will 
be  difficult.  If  GPRA  is  used  to  affect  research  budgets,  valid 
procedures  to  normalize  scores  will  be  essential,  and  they  do  not 
exist  now.  This  is  a  very  fertile  area  for  peer  review  research. 

REPEATABILITY/  RELIABILITY  OF  PEER  REVIEW 

In  a  physical  system  experiment,  one  of  the  main  questions 
asked  to  gauge  credibility  of  the  results  concerns  the 
repeatability  of  the  results.  Can  the  same  experiment  be  run  at 
different  laboratories  under  the  same  controlled  conditions  and 
yield  the  same  results,  or  some  reasonable  facsimile  thereof?  The 
analogous  issue  in  peer  review  has  been  termed  alternatively 
reliability,  repeatability,  consistency,  uniformity,  etc.,  and  has 
received  much  focus  in  the  literature  [Bailar,  1991;  Ceci,  1982; 
Cicchetti,  1976,  1979,  1991;  Cole,  1991;  Colman,  1991;  Crothers, 
1993;  Daniel,  1993;  Gorman,  1991;  Halpin,  1986;  Kiesler,  1991; 
Kraemer,  1991;  Laming,  1991;  Luce,  1993;  Marsh,  1989;  Roediger, 
1991;  Rosenthal,  1990,  1991,  Rubin,  1992].  The  meaning  is  the 
same. 

There  are  two  corollary  concepts  in  physical  systems  which 
unfortunately  are  not  always  carried  over  to  peer  reviews.  These 
are  the  concepts  of  precision  and  accuracy.  Precision  represents 
the  degree  to  which  a  measurement  value  can  be  replicated,  while 
accuracy  represents  the  relation  of  the  measurement  value  to  some 
absolute  value  or  standard. 

In  a  very  comprehensive  study  of  the  reliability  of  peer 
review  for  manuscripts  and  grant  proposals  [Cicchetti,  1991],  which 
included  hundreds  of  references,  reliability  was  defined 
generically  by  different  measures:  internal  consistency, 
interreferee  agreement  (degree  of  agreement  among  referees) ,  and 
stability  across  time.  Reliability  by  these  definitions  appears  to 
be  the  analog  of  precision  as  defined  above,  and  the  issue  of 
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accuracy  does  not  appear  to  enter  the  definition.  The  study  stated 
that  the  most  common  measure  is  interreferee  agreement  at  a  given 
point  in  time.  The  study  essentially  concluded  that,,  across  the 
various  science  disciplines  examined:  1)  agreement  is  better  on 
manuscript  and  grant  submissions  of  perceived  poor  quality  than  on 
submissions  of  good  quality;  2)  better  defined  (specific  and 
specialized)  areas  of  scientific  inquiry  have  higher  acceptance 
rates  and  use  fewer  reviewers  than  less  well-defined  (general  and 
less  focused)  areas  of  scientific  interest;  and  3)  levels  of 
chance-corrected  interreferee  agreement  are  rather  low. 

However,  neither  the  study  commentary  nor  the  descriptions  of 
the  referenced  studies  addressed  the  issue  of  truly  random  reviewer 
selection,  and  therefore  the  meaning  of  the  study  conclusions  is 
open  to  question.  For  example,  what  is  the  meaning  of  high 
reliability  under  these  conditions.  It  could  mean  that  the 
reviewers  were  able  to  identify  and  report  accurately  on  the 
intrinsic  quality  of  the  manuscript/  proposal,  or  it  could  mean 
that  the  reviewers  were  selected  because  of  their  extreme  bias 
(positive  or  negative)  toward  the  topic  and  the  review  manager  did 
an  outstanding  job  of  selecting  reviewers  with  similar  biases. 

In  addition,  there  is  a  school  of  thought  that  chance- 
corrected  interreferee  agreement  should  in  fact  be  low,  because  the 
astute  manager  will  pick  reviewers  who  have  sharply  different 
viewpoints  and  expertise,  so  that  they  should  be  sensitive  to 
different  kinds  of  problems.  From  this  perspective,  too  much 
agreement  may  be  a  sign  of  weakness,  that  the  system  is  not 
eliciting  the  full  spectrum  of  opinion  that  the  manager  needs  to 
make  an  informed  decision. 

A  study  of  National  Science  Foundation  (NSF)  proposals  [Cole, 
1981],  funded  by  NSF,  using  two  sets  of  reviewers,  showed  a 
reversal  rate  (one  group's  decision  would  have  been  reversed  by  the 
other  group)  of  about  twenty-five  percent.  Since  an  entirely 
random  process  would  have  produced  a  reversal  rate  of  fifty 
percent,  it  was  concluded  that  the  fate  of  a  particular  grant 
application  is  roughly  half  determined  by  the  characteristics  of 
the  proposal  and  the  principal  investigator,  and  about  half  by 
apparently  random  elements.  It  was  also  concluded  that  the  great 
bulk  of  reviewer  disagreement  observed  is  probably  a  result  of  real 
and  legitimate  differences  of  opinion  among  experts  about  what  good 
science  is  or  should  be. 

Similar  reliability  studies  of  research  program  reviews  do  not 
appear  to  be  in  the  literature,  probably  because  of  the  expense  and 
effort  of  doing  the  replication  involved  in  such  studies, 
especially  for  panel  reviews,  and  the  question  of  whether  the 
identical  process  is  actually  being  replicated.  The  author's 
experience  with  reviews  of  existing  and  proposed  research  programs, 
a  small  fraction  of  which  was  documented  and  analyzed 
mathematically  [Kostoff,  1992a],  is  that  reliability  is  sufficient 
for  practical  purposes.  While  a  peer  review  can  gain  consensus  on 
the  proposed  and  existing  reseach  programs  that  are  either 
outstanding  or  poor,  there  will  be  differences  of  opinion  on  the 
programs  that  cover  the  much  wider  middle  range.  For  programs  in 
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this  middle  range,  their  fate  is  somewhat  more  sensitive  to  the 
reviewers  selected.  If  a  key  purpose  of  a  peer  review  is  to  insure 
that  the  outstanding  programs  are  funded  or  continued,  and  the  poor 
programs  are  either  terminated  or  modified  strongly,  then  the 
capabilities  of  the  peer  review  instrument  are  well  matched  to  its 
requirements . 

The  author's  experience  with  the  reliability  of  program  peer 
reviews  appears  to  be  somewhat  less  negative  than  those  above,  or 
other  similar  studies  reported  in  the  literature.  Why  is  this?  It 
probably  is  due  in  large  measure  to  how  the  peer  review  is 
conducted.  In  many  proposal  and  manuscript  reviews  reported  in  the 
literature,  there  tends  to  be  minimal  feedback  among  the  reviewers, 
and  between  the  reviewers  and  authors/  proposers.  Probably  at  best 
there  is  one  written  rebuttal.  This  independence  is  undoubtedly 
valued,  and  is  also  less  expensive  than  convening  all  the  players 
to  interact  jointly. 

The  author's  peer  reviews  involve  extensive  interaction  among 
the  reviewers  and  presenters.  Many  misunderstandings  and 
differences  in  interpretation  are  clarified  during  the  exchange  of 
technical  information  before  the  scoring  is  performed.  The  initial 
scoring  is  performed  independently  by  the  reviewers.  Then, 
differences  in  scores  are  discussed,  and  the  reviewers  are  provided 
the  opportunity  to  modify  their  scores.  Usually,  the  final  scores 
become  closer.  From  the  author's  observations,  this  scoring 
variance  reduction  is  not  due  to  the  dominance  of  more  forceful  or 
vociferous  debaters,  but  rather  is  due  to  each  reviewer's  coming  to 
a  better  understanding  of  the  intrinsic  nature  of  the  material 
presented.  Thus,  rather  than  interreviewer  agreement  as  the 
measure  of  reliability  used  for  the  journal  manuscript  analyses 
[Chicchetti,  1991],  for  research  program  peer  review  a  better 
measure  of  reliability  may  be  agreement  of  average  panel  scores 
after  panels  are  conducted  in  the  interactive  mode  suggested  above. 

EFFECTIVENESS/  PREDICTABILITY  OF  PEER  REVIEW 

The  issue  of  peer  review  predictability  affects  the 
credibility  of  technological  forecasting  directly.  For  an 
organization  conducting  peer  review  of  research,  it  would  be 
desirable  to  relate  the  reviewers'  scores  to  downstream  impacts  on 
the  organization's  mission  [Abrams,  1991;  Van  den  Beemt,  1991, 
1997].  A  few  studies  have  been  done  relating  reviewers'  scores  on 
component  evaluation  criteria  to  proposal  or  project  review 
outcomes  (e.g.,  [DOE,  1982;  Kostoff,  1992a]).  Some  studies  have 
been  done  in  which  reviewers'  ratings  of  research  papers  have  been 
compared  to  the  numbers  of  citations  received  by  these  papers  over 
time  [Bornstein,  1991a;  Bornstein,  1991b].  Correlations  between 
reviewers'  estimates  of  manuscript  quality  and  impact  and  the 
number  of  citations  received  by  the  paper  over  time  were  relatively 
low.  Bornstein  concludes,  after  an  extensive  survey  of  peer  review 
reliability  and  validity,  that:  "If  one  attempted  to  publish 
research  involving  an  assessment  tool  whose  relizdsility  and 
validity  data  were  as  wezJc  as  that  of  the  peer  review  process. 
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there  is  no  question  that  studies  involving  this  psychometrically 
flawed  instrinnent  would  be  deemed  unacceptable  for  publication." 

[Bornstein,  1991b] . 

The  author  is  not  aware  of  large-scale  studies,  singly  or  in 
tandem,  that  have  related  peer  review  scores/rankings  of  proposals 
to  downstream  impacts  of  the  research  on  technology,  systems,  and 
operations,  although  some  efforts  toward  this  end  have  been 
initiated  [Van  den  Beemt,  1991].  This  type  of  study  would  require 
an  elaborate  data  tracking  system  over  lengthy  time  periods  which 
does  not  exist  today.  Thus,  the  value  of  peer  review  as  a 
predictive  tool  for  assessing  the  impact  of  research  on  an 
organization's  mission  (other  than  research  for  its  own  sake)  rests 
on  faith  more  than  on  hard  documented  evidence. 

COSTS  OF  PERFORMING  A  PEER  REVIEW 

Another  problem  with  peer  review  is  cost  [ASTEC,  1991; 
Buechner,  1974;  Hensley,  1980;  Kostoff,  1995d,  1996a].  The  true 
total  costs  of  peer  review,  as  will  be  shown,  can  be  considerable 
but  tend  to  be  ignored  or  understated  in  most  reported  cases. 
Because  there  are  many  different  types  of  peer  review,  it  is  very 
difficult  to  provide  a  total  cost  rule-of-thumb  for  generic  peer 
review.  Nevertheless,  consider  the  following  illustrative  example 
for  an  order  of  magnitude  estimate  on  total  research  program  peer 
review  costs  [Kostoff,  1995a] . 

Assume  that  an  interim  peer  review  is  desired  of  a  $lM/yr 
program  at  a  laboratory.  The  review  mode  of  operation  will  be  to 
bring  a  panel  of  experts  to  the  laboratory  site  for  two  days,  and 
hear  presentations  from  the  principal  investigators.  Assvime  that 
the  panel  consists  of  ten  experts  in  research,  technology,  mission 
operations,  etc.,  and  that  eight  principal  investigators  will 
present  their  projects  to  the  panel.  The  loaded  cost  (salary  plus 
overhead)  for  each  panel  member  is  assumed  to  be  $150,000  per  year, 
and  the  loaded  cost  for  each  principal  investigator  is  assumed  to 
be  $125,000  per  year.  Direct  expenditures,  such  as  panel  per  diem 
and  travel  costs,  would  be  in  the  neighborhood  of  $6,000-8,000. 
Any  honoraria  would  increase  this  cost. 

Indirect  expenditures,  such  as  total  reviewer,  presenter, 
staff,  and  review  audience  time  spent  toward  the  review,  would  be 
in  the  range  of  $125,000  and  would  include  at  least  the  following: 

1.  Presenter  time  in  preparing  background  material  for 
reviewers  to  read  before  review,  preparing  the  presentation,  making 
dry  runs  for  management,  etc.  [$40,000  estimate;  80  person-days]; 

2.  Panel  member  time  for  reading  background  material  (papers, 
reports,  plans) ,  traveling  to  review,  spending  time  at  meeting, 
writing  report,  etc.  [$48,000-60,000  estimate;  80-100  person-days] ; 

3.  Agency  staff  time  for  identifying  and  soliciting  reviewers, 
establishing  review  and  coordinating  with  lab,  writing  reports, 
etc.  [$10,000  estimate;  20  person-days]; 

4.  Audience  (lab  management,  other  lab  personnel,  other  agency 
representatives,  etc.)  time  at  review  [$20,000  estimate;  40  person- 
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days  ] . 


The  main  conclusion  of  this  discussion  is  that  .for  serious 
panel-type  peer  reviews,  where  sufficient  expertise  is  represented 
on  the  panels,  total  real  costs  will  dominate  direct  costs.  This 
conclusion  would  also  be  true  for  mail-type  peer  reviews.  While 
the  total  costs  of  mail-type  peer  reviews  would  be  less  than  those 
of  panel-type  peer  reviews  due  to  the  absence  of  travel  costs,  the 
ratio  of  total  costs  to  direct  costs  for  mail-type  peer  reviews 
would  be  very  high.  The  major  contributor  to  total  costs  for 
either  type  of  review  is  the  time  of  all  the  players  involved  in 
executing  the  review.  With  high  quality  performers  and  reviewers, 
time  costs  are  high,  and  the  total  review  costs  can  be  a  non- 
negligible  fraction  of  total  program  costs,  especially  for  programs 
that  are  people  intensive  rather  than  hardware  intensive. 

ETHICAL  ISSUES  IN  PEER  REVIEW 

In  the  research  profession,  there  is  a  plethora  of  ethical 
issues,  including  scientific  fraud,  scientific  misconduct, 
betraying  confidential  information,  and  unduly  profiting  from 
access  to  privileged  information.  There  are  both  legal  and 
unwritten/  unspoken  agreements  and  penalties  which  underly  the 
maintenance  of ‘ethical  standards  in  these  areas.  One  subordinate 
objective  of  peer  review,  whether  at  the  manuscript  [Fox,  1994], 
proposal,  or  program  level,  is  to  maintain  high  ethical  standards, 
especially  as  applied  to  fraud  and  misconduct.  Since  many  of  the 
fraud  and  misconduct  violations  have  occurred  in  the  written 
technical  product,  most  of  the  reported  applications  of  peer  review 
in  this  area  have  emanated  from  journal  peer  review  [Fielder,  1995; 
Goodstein,  1995;  Gupta,  1996;  Keown,  1996;  Mokrasch,  1988;  Moran, 
1992;  Southgate,  1992].  The  maintenance  of  ethical  standards”  in 
these  areas  tends  to  be  through  self-pol icing  by  the  research 
community.  The  author  has  seen  no  program  peer  reviews  in  which 
fraud  and  misconduct  were  uncovered,  and  has  not  identified  any 
such  cases  in  the  literature. 

There  is  a  fundamental  ethical  paradox  which  underlies  any 
form  of  research  peer  review.  For  the  review  process  to  have 
credibility,  experts  must  be  employed,  either  for  the  right  nob 
function  or  the  nob  right  function.  Contrary  to  popular  opinion, 
it  has  been  the  author's  experience,  based  on  directed  experiments 
and  on  personal  observations  during  the  conduct  of  reviews,  that 
there  are  very  few  real  experts  in  any  specific  research  field. 
Armstrong  [Armstrong,  1997]  draws  a  similar  conclusion  relative  to 
manuscript  peer  review,  to  the  effect  that  the  reviewers  may  work 
on  similar  areas  but  not  the  same  specific  problem,  so  that  the 
reviewers  have  less  experience  on  the  total  problem  than  do  the 
authors.  Thus,  in  order  to  obtain  real  experts  for  a  panel,  at 
least  to  evaluate  the  i  ob  right  aspects  of  the  research,  a 
relatively  small  community  must  be  accessed.  Usually,  the  members 
of  this  community  are  acquainted  with  each  other,  and  are  either 
research  collaborators  or  research  competitors.  They  may  compete 
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for  funds  or  awards  or  prestige  or  promotions,  or  other  types  of 
recognition.  Thus,  there  is  an  inherent  bias/  conflict  of  interest 
in  the  process  when  real  experts  are  desired  as  reviewers. 

Usually,  in  research  program  peer  review,  there  are  (or  should 
be)  documents  which  reviewers  sign  to  protect  the  confidentiality 
of  the  research  being  reviewed,  but  pragmatically  it  is  the 
adherence  to  the  unwritten  and  unspoken  ethical  standards  which 
restricts  the  unwarranted  use  of  proprietary  and  sensitive 
information.  There  are  also  legal  protections,  and  recently  there 
have  been  court  cases  brought  by  those  who  felt  their  confidences 
and  proprietary  research  had  been  violated  through  illegal 
expropriation  of  the  results  for  personal  reviewer  gain. 

No  matter  what  documents  reviewers  sign,  nor  what  desires  they 
have  to  adhere  to  the  highest  ethical  standards,  they  cannot  help 
but  be  influenced  by  the  privileged  information  to  which  they  have 
access.  The  transfer  of  knowledge  occurs  through  many  pathways, 
and  listening  to  detailed  technical  presentations  or  reading 
technical  proposals  are  probably  two  of  the  more  effective.  Thus, 
the  operative  solution  to  the  ethical  dilemma  posed  by  access  to 
technical  material  is  the  principle  of  compromise  rather  than  the 
compromise  of  principles.  The  ethical  reviewer  takes  no  conscious 
overt  actions  to  reveal  confidences  or  profit  unduly  from 
participation  in  the  peer  review,  but  rather  accepts  as  his  reward 
for  participation  the  satisfaction  of  having  aided  the  larger 
research  enterprise  and  having  improved  his  thought  processes  from 
exposure  to  different  ideas.  If  the  larger  use  of  research  program 
peer  review  becomes  a  reality,  and  if  the  outcomes  are  used  to 
influence  budgetary  decisions,  then  more  efforts  need  to  be  devoted 
to  insure  adherence  to  some  of  the  ethical  standards  discussed 
here. 


ALTERNATIVES  TO  PEER  REVIEW 

This  paper  has  identified  a  number  of  problems  associated  with 
the  use  of  peer  review.  These  problems  conceptually  transcend  the 
different  peer  review  applications  of  program,  proposal,  and 
manuscript  evaluation,  although  the  implementation  severity  of 
different  problems  is  different  for  each  of  the  applications. 
There  have  been  a  number  of  proposals  for  peer  review  modifications 
or  complete  alternatives  [Forsdyke,  1991;  Greene,  1991;  Roy,  1981, 
1984,  1985;  Smith,  1988;  Wick,  1996;  Wood,  1997],  in  attempts  to 
overcome  the  most  egregious  aspects  of  peer  review.  Most  of  these 
alternative  concepts  focus  specifically  on  research  proposal  peer 
review,  although  some  of  their  component  ideas  apply  to  the  other 
applications  of  peer  review  as  well.  Two  of  the  more  widely  known 
alternatives  will  now  be  presented  and  critiqued. 

Bicameral  Review 

A  modified  form  of  peer  review  for  project  selection  has  been 
propounded  in  recent  years  by  some  Canadian  scientists  [Berezin, 
1995;  Forsdyke,  1991].  This  methodology  has  been  termed  "Bicameral 
Review"  by  its  originator.  Dr.  Forsdyke,  and  its  essence  is  as 
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follows. 

The  structure  of  Bicameral  Review  is  founded  on  the  assumption 
that  the  research  funding  system  is  highly  error-prone  due  to  the 
inherent  uncertainty  of  predicting  the  outcome  of  basic  research. 
If  an  evaluation  system  is  highly  error-prone,  then  that  error- 
proneness  has  to  be  taken  into  account  in  system  design.  Two 
principles  of  decision-making  in  uncertain  environments  are:  1) 
place  most  weight  on  parameters  most  likely  to  be  assessed  with 
some  degree  of  objectivity,  and  2)  hedge  your  bets. 

In  Bicameral  Review,  grant  applications  are  divided  into  a 
major  retrospective  part  (track  record  of  proposers) ,  and  a  minor 
prospective  part  (the  work  proposed) ,  which  are  routed  separately. 
The  retrospective  part  only  is  siibjected  to  peer  review.  The 
prospective  part  is  subjected  to  in-house  review  by  the  agency, 
solely  with  respect  to  budget  justification.  The  peers  are 
required  to  assess  not  just  productivity,  but  productivity  per 
dollar  received.  Furthermore,  they  have  to  factor  in  the 
experience  of  the  applicant.  Young  researchers  are  given  more 
funding  "rope”  (the  benefit  of  the  doiabt)  ,  until  they  have 
established  a  record.  Funding  is  allocated  on  a  sliding  scale, 
replacing  existing  sharp  fund-no  fund  cutoffs.  Only  those  at  the 
very  top  of  the  funding  scale  would  get  all  the  funds  they  needed 
to  complete  the  work  in  a  reasonable  time.  As  the  merit  rating  of 
the  projects  decreased  down  the  funding  scale,  the  fraction  of 
requested  funds  would  decrease  as  well. 

Productivity-Based  Formula  Systems 

A  non-peer  review  alternative  has  been  proposed  [Roy,  1981, 
1985],  based  on  the  principles  that  past  success  is  the  best 
predictor  of  future  performance,  supporting  small  groups  on  a 
continuing  basis  for  a  reasonable  time  period  increases 
probabilities  of  success  and  system  efficiencies,  and  most 
innovative  science  is  done  with  a  minimum  of  micro-management. 
This  alternative  proposes  that  researchers  be  funded  essentially 
based  on  track  record,  and  provides  an  algorithm  for  allocating 
funds.  In  one  algorithmic  incarnation  [Roy,  1985],  the  dollars 
awarded  would  be  proportional  to  some  weighted  sum  of  numbers  of 
publications,  numbers  of  advanced  degrees,  dollar  volume  of 
research  support  from  mission  agencies,  and  dollar  volume  of 
research  support  from  industry,  and  the  award  would  be  to  a 
research  unit  (Departments,  etc) .  Again,  the  underlying  principle 
is  that  performance  rather  than  promise  will  provide  a  much  firmer 
basis  for  public  accountability.  New  investigators  added  to  a 
research  unit  would  have  extra  shares  added  to  the  base  formula 
allocation. 

Author's  Commentary  on  Alternatives 

Ideally,  a  research  proposal  evaluation  process  should  be  able 
to  allocate  funds  to  the  ideas  with  the  greatest  potential, 
independent  of  the  source  of  these  ideas.  Such  a  process  should  be 
able  to  include  ideas  from  established  researchers  with  strong 
track  records,  established  researchers  with  weak  track  records,  and 
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new  researchers  with  no  track  records.  It  should  be  able  to  cover 
researchers  from  academia,  government,  and  industry,  ranging  from 
one  person  operations  to  very  large  organizations,,  and  cover 
classified  and  non-classif ied  work  with  different  venues  and 
cultures  for  reporting  research  results.  The  allocation  process 
should  incorporate  the  best  technical  judgements  in  arriving  at 
final  decisions,  recognizing  the  uncertainties  involved  in 
projecting  the  outcomes  of  fundamental  research. 

The  two  alternative  approaches  selected  place  heavy  emphasis 
on  awards  to  established  researchers  with  strong  track  records, 
although  they  differ  in  how  the  track  records  would  be  determined, 
with  Bicameral  using  peers  and  productivity-based  using  a  formula. 
Both  minimize  the  use  of  true  technical  experts  in  the  evaluation 
of  the  prospective  portion  of  proposed  research.  In  actual 
practice,  these  alternatives  would  not  differ  quite  as 
significantly  from  existing  peer  review  processes  as  might  be 
imagined  from  first  reading.  As  stated  previously  in  this  paper, 
analyses  have  shown  that  Team  Quality,  a  euphemism  for  performer 
track  record,  is  the  dominant  factor  in  determining  reviewer 
overall  quality  score  for  existing  and  proposed  research.  Thus, 
both  the  existing  and  alternative  approaches  de  facto  place  heavy 
emphasis  on  track  record.  The  real  difference  between  the 
alternatives  and  the  existing  approaches,  in  the  author's  opinion, 
is  the  use  of  technical  experts  in  evaluating  the  prospective 
portion  of  the  proposal. 

While  both  alternative  approaches  would  reduce  the  cost  of 
submitting  proposals  to  some  degree,  would  reduce  the  impacts  of 
reviewer  bias,  would  reduce  substantially  whatever  pirating  exists 
of  novel  ideas  by  competitors,  and  would  eliminate  some  unnecessary 
time  expenditures  in  the  review  processes,  they  have  some 
drawbacks.  Extremely  heavy  emphasis  on  track  record  to  the 
exclusion  of  expert  judgement  on  proposed  concepts  promulgates 
continuation  of  orthodox  mainstream  approaches  by  increasing  the 
obstacles  to  new  entrants  into  the  research  arena.  Lack  of 
technical  expertise  in  the  judgement  of  proposed  research  could 
lead  to  more  non-technical  factors  predominating  in  the  selection 
process,  and  the  relative  ascendance  of  form  over  substance  in  the 
evaluation. 

In  a  zero-sum  game,  the  Bicameral  Review  process  appears 
to  allocate  some  funds  from  the  'best'  proposals  to  the  'worst' 
proposals  because  of  the  sliding  scale  and  elimination  of  the  sharp 
cutoff.  It  does,  however,  provide  a  'safety-net'  which  allocates 
some  funding  to  all,  or  almost  all,  researchers. 

The  productivity  based  system  has  some  analogies  to  the 
present  GPRA  approach  addressed  in  the  companion  Science  article 
[Kostoff,  1997h] ,  and  suffers  from  many  of  the  same  drawbacks.  Use 
of  any  metric  or  combination  of  metrics  as  a  stand-alone  approach 
for  evaluating  research  is  subject  to  error.  The  metrics  chosen 
may  or  may  not  be  a  valid  indicator  of  research  quality; 
interpretation  by  peers  is  required  to  validate  the  credibility  of 
the  metrics.  The  formula  based  approach  has  the  negative  potential 
of  driving  researchers  to  achieve  numerical  output  targets  rather 
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than  fundamental  understanding. 

The  productivity  approach  is  similar  to  a  recursive  system  of 
equations,  and  if  the  initial  conditions  are  flawed/  the  final 
figure  of  merit  would  be  flawed.  For  example,  one  of  the  formula 
terms  is  dollars  received  for  research  from  mission  agencies. 
Suppose  a  research  team  had  received  major  grants  that  were 
'earmarked'  in  legislation.  This  could  lead  to  better  numbers  for 
at  least  two  of  the  other  formula  terms  as  well,  numbers  of 
graduate  students  and  papers  produced,  and  then  result  in  a  high 
overall  figure  of  merit  that  was  not  necessarily  related  to  the 
intrinsic  quality  of  the  research  program.  This  allocation  based 
on  flawed  initial  conditions  would  recur  each  year  until  it  became 
a  self-perpetuating  system,  even  after  the  'earmarking'  was 
terminated.  Thus,  if  any  formula  or  combination  of  quantitative 
indicators  is  used,  it  must  be  accompanied  by  and  siabordinate  to 
expert  peer  review,  in  order  to  avoid  the  occurrence  of  situations 
such  as  the  one  above. 

These  alternatives,  and  others  of  similar  nature,  are  based  on 
the  premise  that  the  peer  review  selection  process  does  not  yield 
the  best  research,  and  the  tremendous  expenditures  of  time  and 
energy  in  generating  proposals  do  not  justify  the  continuance  of 
such  an  inexact  process.  The  validity  of  this  basic  premise  can  be 
challenged.  While  peer  review  has  its  imperfections  and 
limitations,  there  is  little  evidence  that  the  best  researchers  and 
ideas  are  going  without  funding,  and  far  less  evidence  that  the 
alternatives  above  would  improve  the  situation. 

SCIENCE  COURT 

A  non-standard  peer  review  approach  for  concept  evaluations  is 
the  Science  Court.  As  in  a  legal  procedure,  it  has  well  defined 
advocates,  critics,  a  jury,  etc.  It  is  a  unique  and  potentially 
powerful  technique,  but  like  any  tool,  can  be  misused  if  not 
understood  and  applied  properly.  It  was  applied  in  the  magnetic 
fusion  office  by  the  author  to  a  review  of  alternate  fusion 
concepts  in  1977  [DOE,  1978]. 

The  general  format  selected  for  the  evaluation  was  a  panel 
review  by  selected  evaluators  with  an  adversary  type  of  procedure. 
The  participants  and  their  roles  in  the  evaluation  are  described 
below. 

The  steering  committee  consisted  of  fusion  ,  office 
representatives.  The  chief  responsibilities  of  this  committee  were 
(1)  to  organize  the  evaluation,  (2)  to  define  the  evaluation 
criteria,  (3)  to  choose  members  of  the  Evaluation  panel,  (4)  to 
assist  the  Evaluation  panel  in  the  reviews,  and  (5)  to  receive  the 
evaluators '  conclusions  and  recommendations  and  draft  a  final 
report  to  the  fusion  office. 

The  Evaluation  panel  was  composed  of  plasma  physicists,  fusion 
reactor  systems  experts,  and  a  representative  of  the  utility 
industry.  The  panel  did  not  include  active  proponents  of  any  of 
the  concepts  under  consideration.  In  case  of  a  remote  conflict  of 
interest,  a  panel  member  excused  himself  from  the  deliberation  on 
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the  particular  concept  involved.  The  panel  was  responsible  for  the 
technical  evaluation  of  all  concepts. 

The  Advocates  of  a  concept  were  those  scientists  and  engineers 
who  were  working  on  a  particular  concept.  The  Advocates  were 
responsible  for  providing  and  defending  scientific  results  and 
projections,  as  well  as  the  technology  and  attractiveness  of  the 
reactor  embodiment.  A  Chief  Advocate  was  designated  to  coordinate 
the  activities  of  the  Advocates. 

Critics  were  chosen  for  their  special  expertise  in  an  area  of 
physics  or  engineering  that  was  important  to  a  particular  concept. 
The  Critics'  responsibility  was  to  ferret  out  crucial  physics  and 
technology  guestions  and  to  aid  the  Evaluation  Panel  in  the  review 
of  experimental  results  and  theoretical  models.  Proponents  of  one 
concept  in  some  cases  served  as  critics  in  the  evaluation  of 
another  concept.  One  person  was  chosen  as  a  Chief  Critic  and  was 
given  the  responsibility  of  coordinating  the  activities  of  the 
Critics. 

Any  of  the  participants  (Advocates,  Critics,  or  the  Evaluation 
Panel)  were  allowed  to  utilize  outside  experts  as  they  deemed 
appropriate.  This  procedure  probably  had  more  debate  and  surfacing 
of  crucial  issues  than  any  other  concept  evaluation  seen  by  the 
author.  However,  it  was  time-consuming  compared  to  a  standard 
panel  assessment. 

IV-A-3.  PEER  REVIEW  PRACTICES 

PROPOSED  PROGRAMS 


1)  NSF 

The  two  largest  Federal  sponsors  of  basic  research  are  the 

National  Institutes  of  Health  (NIH)  and  the  National  Science 

Foundation  (NSF)  [NSF,  1996] .  The  NSF  peer  review  process  of 
research  proposals  illustrates  how  potential  research  impact 

influences  selection  of  new  research  areas.  In  the  NSF  process, 

proposals  received  are  assigned  to  program  officers  for  review. 
The  program  officers  select  external  peer  reviewers  and  use  mail 
and/or  panel  approaches  to  have  the  proposals  assessed  and  rated. 
The  program  officers  then  perform  their  own  assessment  of  the 
proposals  and  forward  their  recommendations  to  higher  levels. 
These  recommendations  are  rarely  overturned  [Frazier,  1987]. 

From  the  the  1987  version  of  the  NSF  Brochure,  Information  for 
Reviewers,  reviewers  use  four  criteria  to  assess  the  proposals; 

1.  Research  Performance  Competence; 

2.  Intrinsic  Merit  of  the  Research; 

3.  Utility  or  Relevance  of  the  Research; 

4.  Effect  of  the  Research  on  the  Infrastructure  of  Science  and 
Engineering. 

These  criteria  were  adopted  by  the  National  Science  Board  in 
1981  [NSF,  1997]. 
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Research  impacts  are  evaluated  through  the  second,  third,  and 
fourth  criteria.  The  second  criterion.  Intrinsic  Merit, 
incorporates  impact  of  the  proposed  research  on  oth^r  research 
fields  in  its  definition  and  is  a  measure  of  the  nearer  term  impact 
of  the  proposed  research.  The  third  criterion,  Utility,  addresses 
potential  contribution  to  an  extrinsic  goal  such  as  a  new 
technology.  The  fourth  criterion,  Infrastructure,  incorporates 
impact  on  the  nation's  research/  education/  human  resource  base. 

In  1996,  the  NSF  merit  review  process  was  evaluated  by  a  task 
force.  The  National  Science  Board  recommended  that  the  new  review 
criteria  proposed  in  the  final  task  force  report  [NSF,  1977]  be 
approved  for  implementation  on  October  1,  1997.  The  specific  task 
force  recommendations  are  that  the  following  two  criteria  be 
adopted  in  place  of  the  four  criteria  that  are  currently  used. 

1.  What  is  the  Intellectual  merit  of  the  proposed  activity? 

The  following  are  suggested  questions  to  consider  in  assessing 
how  well  the  proposal  meets  this  criterion:  How  important  is  the 
proposed  activity  to  advancing  knowledge  and  understanding  within 
its  own  field  and  across  different  fields?  How  well  qualified  is 
the  proposer  (individual  or  team)  to  conduct  the  project?  (If 
appropriate,  please  comment  on  the  quality  of  prior  work.)  To  what 
extent  does  the  proposed  activity  suggest  and  explore  creative  and 
original  concepts?  How  well  conceived  and  organized  is  the 
proposed  activity?  Is  there  sufficient  access  to  resources? 

2.  What  are  the  broader  impacts  of  the  proposed  activity? 

The  following  are  suggested  questions  to  consider  in  assessing 
how  well  the  proposal  meets  this  criterion:  How  well  does  the 
activity  advance  discovery  and  understanding  while  promoting 
teaching,  training,  and  learning?  How  well  does  the  proposed 
activity  broaden  the  participation  of  underrepresented  groups 
(e.g.,  gender,  ethnicity,  geographic,  etc.)?  To  what  extent  will 
it  enhance  the  infrastructure  for  research  and  education,  such  as 
facilities,  instrumentation,  network,  and  partnerships?  Will  the 
results  be  disseminated  broadly  to  enhance  scientific  and 
technological  understanding?  What  may  be  the  benefits  of  the 
proposed  activity  to  society? 

The  task  force  further  recoiamended  that  a  cover  sheet  be 
attached  to  the  proposal  review  form,  which  presents  the  context 
for  using  the  criteria.  The  suggested  language  for  this  cover 
sheet  is  as  follows: 

Important I  Please  Read  Before  Beginning  Your  Review 1 

In  evaluating  this  proposal,  you  are  requested  to  provide 
detailed  comments  for  each  of  the  two  NSF  Merit  Review  Criteria 
described  below.  Following  each  criterion  is  a  set  of  suggested 
questions  to  consider  in  assessing  how  well  the  proposal  meets  the 
criterion.  Please  respond  with  substantive  comments  addressing  the 
proposal's  strengths  and  weaknesses.  In  addition  to  the  suggested 


69 


questions,  you  may  consider  other  relevant  questions  that  address 
the  NSF  criteria  (but  you  should  make  this  explicit  in  your 
review)  .  Further,  you  are  asked  to  address  only  ques.tions  which 
you  consider  relevant  to  the  proposal  and  that  you  feel  qualified 
to  make  judgements  on. 

When  assigning  your  summary  rating,  remember  that  the  two 
criteria  need  to  be  weighted  equally.  Emphasis  should  depend  upon 
either  (1)  additional  guidance  you  have  received  from  NSF  or  (2) 
your  own  judgement  of  the  relative  importance  of  the  criteria  to 
proposed  work.  Finally,  you  are  requested  to  write  a  summary 
statement  that  explains  the  rating  that  you  assigned  to  the 
proposal.  This  statement  should  address  the  relative  importance  of 
the  criteria  and  the  extent  to  which  the  proposal  actually  meets 
both  criteria. 

Regarding  the  ’ratings'  issue,  which  was  highlighted  in  the 
Discussion  Report,  the  task  force  recommended  that  the  NSF 
'generic'  proposal  review  form  provide  for  the  following: 

1.  separate  comments  for  each  critierion 

2.  single  composite  rating 

3.  a  summary  recommendation  (narrative)  that  address  both 
criteria 

In  the  new  process,  research  impacts  are  the  focus  of  the 
second  criterion.  These  include  impacts  on  infrastructure, 
education,  science,  technology,  and  diversity.  Thus,  not  only  are 
technical  impacts  considered,  but  potential  socio-political  impacts 
are  considered  as  well.  Finally,  it  is  unclear  how  other  unwritten 
criteria,  such  as  government  vs  industry  appropriateness  for 
funding,  which  may  be  important  for  a  specific  project/program, 
would  impact  the  composite  rating. 

2)  NIH 

In  the  NIH  process,  proposals  are  sent  to  initial  peer 
review  groups,  composed  mainly  of  active  researchers  at  colleges 
and  universities,  where  they  are  reviewed  for  scientific  and 
technical  merit.  After  receiving  a  priority  rating  from  the  peer 
reviewers,  the  proposals  are  then  sent  to  a  statutorily  mandated 
advisory  council,  composed  of  scientists  and  public  members,  for  a 
program  relevance  review.  After  the  council  members  recommend 
action  to  be  taken  on  the  proposals  (usually  concurrence  with  the 
peer  group  recommendations,  but  sometimes  special  action  [Frazier, 
1987]),  the  institute  staff  rank  the  proposals  and  initiate  a 
funding  strategy. 

In  response  to  a  perceived  need  to  refocus  the  review  of  grant 
applications  on  the  quality  of  the  science  and  the  impact  it  might 
have  on  the  field,  rather  than  on  details  of  technique  and 
methodology,  NIH  has  developed  five  new  criteria  for  initial  review 
of  proposals  for  implementation  in  October  1997.  Reviewers  will  be 
asked  to  apply  the  criteria  in  judging  whether  the  proposed 
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research  is  likely  to  have  a  s;ibstantial  impact  on  advancing  the 
goals  of  NIH-supported  research:  advancing  understanding  of 
biological  systems,  improving  control  of  disease,  and  enhancing 
health.  The  new  rating  criteria  are; 

Significance;  Does  this  study  address  an  important  problem?  If  the 
aims  of  the  application  are  achieved,  how  will  scientific 
knowledge  be  advanced?  What  will  be  the  effect  of  these  studies  on 
the  concepts  or  methods  that  drive  this  field? 

Approach:  Are  the  conceptual  framework,  design,  methods,  and 
analyses  adequately  developed,  well-integrated,  and  appropriate  to 
the  aims  of  the  project?  Does  the  applicant  acknowledge  potential 
problem  areas  and  consider  alternative  tactics? 

Innovation:  Does  the  project  employ  novel  concepts,  approaches  or 
method?  Are  the  aims  original  and  innovative?  Does  the  project 
challenge  existing  paradigms  or  develop  new  methodologies  or 
technologies? 

Investigator:  Is  the  investigator  appropriately  trained  and  well 
suited  to  carry  out  this  work?  Is  the  work  proposed  appropriate  to 
the  experience  level  of  the  principal  investigator  and  other 
researchers  (if  any)? 

Environment:  Does  the  scientific  environment  in  which  the  work  will 
be  done  contribute  to  the  probability  of  success?  Do  the  proposed 
experiments  take  advantage  of  unique  features  of  the  scientific 
environment  or  employ  useful  collaborative  arrangements?  Is  there 
evidence  of  institutional  support? 

In  assigning  a  single  global  score  for  each  application,  the 
reviewers  are  to  consider  all  criteria,  weighting  each  criterion 
as  appropriate  for  each  application. 

It  appears  that  only  the  first  criterion.  Significance,  relates  to 
impact,  and  can  include  the  relatively  near  term  impact  on  allied 
research  fields.  Broader  impact  and  relevance  issues  appear  to  be 
the  purview  of  the  advisory  councils.  The  council  members  are 
asked  to  assess  the  fairness  and  appropriateness  of  the  initial 
scientific  review  as  well  as  the  proposal's  relevance  to  institute 
research  program  goals  and  broader  societal  health-related 
matters . 

3)  ONR 

The  ONR  does  not  require  formal  peer  review  of  individual 
research  grants,  but  leaves  the  choice  of  peer  review  to  its 
scientific  officers.  Circa  1992,  it  required  a  competitive  process 
among  internal  Navy  organizations  (claimants)  with  external 
reviewers  for  those  accelerated  program  proposals  which  constituted 
about  30  per  cent  of  the  total  ONR  program  [Kostoff,  1988,  1991a, 
1992a] .  The  claimants  that  won  the  competition  then  went  to  the 
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technical  community  (if  their  charter  was  extramural)  and 
advertised  their  areas  of  interest  for  proposals,  or,  if  their 
charter  was  intramural,  performed  the  work  in-house.  ’ 

In  a  detailed  description  of  the  competition  [Kostoff,  1988], 
all  the  accelerated  programs  proposed  by  the  claimants  (ARIs)  were 
categorized  into  areas  of  similar  science,  and  the  proposals  in 
each  area  were  evaluated  by  a  panel  of  experts  external  to  ONR. 
The  written  portion  of  the  evaluation  required  numbers  and  comments 
for  factors  related  to  research  quality  and  Navy  relevance.  In 
this  process,  the  factors  on  the  scoresheet  relating  to  potential 
research  impact  estimation  were: 

1.  Research  Merit  (RM) ; 

2.  Potential  Impact  on  Naval  Needs  (FINN); 

3.  Potential  for  Transition  or  Utility  (PTU) . 

The  Research  Merit  criterion  incorporates  the  potential  impact 
of  the  research,  if  successful,  on  allied  research  areas.  The 
Potential  Impact  on  Naval  Needs  criterion  deals  with  downstream 
impact  of  the  proposed  research  on  naval  systems  and  operations. 
The  Potential  for  Transition  or  Utility  criterion  incorporates  the 
potential  nearer  term  impacts  of  the  proposed  research.  Transition 
refers  to  the  actual  transfer  of  research  programs  to  development 
and  Utility  refers  to  other  mechanisms  by  which  a  program's  results 
would  be  transmitted  to,  and  used  by,  the  technical  community. 

A  key  component  of  this  process  was  the  use  of  mixed  levels  of 
reviewers  on  the  panels  to  evaluate  the  different  potential  impacts 
of  research.  The  panels  included  bench-level  researchers  to 
address  the  impact  of  the  proposed  research  on  the  field  itself; 
broad  research  managers  to  address  potential  impact  on  allied 
research  fields;  technologists  to  address  potential  impact  ■  on 
technology  and  the  potential  of  the  research  to  transition  to 
higher  levels  of  development;  systems  specialists  to  address 
potential  impact  on  systems  and  hardware;  and  operational  naval 
officers  to  address  the  potential  impact  on  naval  operations.  The 
presence  of  reviewers  with  different  research  target  perspectives 
and  levels  of  understanding  on  one  panel  provided  a  depth  and 
breadth  of  comprehension  of  the  different  facets  of  the  research 
impact  that  could  not  be  achieved  by  segregating  the  science  and 
utility  components  into  separate  panels  and  discussions.  The 
interplay  among  reviewers  coming  from  different  perspectives 
allowed  each  reviewer  to  incorporate  elements  of  other  perspectives 
into  his  decisionmaking  process. 

A  multiple  regression  analysis  showed  RM  to  be  the  most 
important  factor  in  determining  the  bottom  line  score  [Kostoff, 
1992a].  FINN  did  not  weigh  as  heavily  in  the  reviewers'  bottom 
line  score  as  did  PTU.  The  reviewers  weighed  nearer-term  impact 
more  heavily  in  their  bottom  line  decisions,  as  evidenced  by  the 
higher  correlations  of  PTU.  Since  the  study  also  showed  that  the 
bulk  of  the  proposed  ARIs  was  viewed  by  the  reviewers  as  basic 
research,  and  since  the  (possibly  far)  downstream  naval  impact  of 
basic  research  may  not  be  evident  in  many  cases,  it  is  not 
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surprising  that  the  more  identifiable  near-term  impacts,  such  as 
transition  to  exploratory  development  or  utility  of  results  by 
other  researchers,  would  affect  reviewers'  bottom  lin’e  decisions 
more  than  the  long  term  impacts. 

4)  STW-NETHERLANDS 

The  Dutch  Technology  Foundation  (STW)  was  founded  in  1981. 
One  of  its  main  functions  is  to  fund  university  research  that  is  of 
high  scientific  quality  and  has  the  potential  to  lead  to  results 
that  can  be  used  by  external  bodies.  In  1981,  STW  opted  for  a  new 
system  for  the  assessment  and  appraisal  of  research  proposals  from 
individual  researchers  (Van  den  Beemt,  1991,  1997) .  STW  devised 
this  new  system  in  order  to  minimize  the  problems  of  selection  by 
large  committees,  by  colleagues,  by  a  few  peers  only  or  by 
organizations  belonging  to  the  discipline  concerned. 

The  system  operates  as  follows:  All  applications  belonging  to 
the  broad  field  of  technology  and  engineering  sciences  are  welcome. 
Every  application  is  sent  initially  to  six  peers  who  are 
specialists  in  the  topic  covered  by  the  proposal;  some  are 
university  staff,  others  work  in  industry.  STW  asks  peers,  first 
by  telephone  and  later  by  mail,  to  give  comments  based  on  two 
criteria:  scientific  quality  and  utilization  potential. 

These  criteria  incorporate  the  following  svib-cr iter ia : 
Subcriteria  relating  to  scientific  quality:  competence  of  a  team, 
originality  of  the  proposal,  effectiveness  of  the  proposed  method, 
the  program  itself,  time  schedule,  available  infrastructure  and 
estimated  costs. 

Subcriteria  relating  to  utilization  potential:  applicability 
of  the  results,  commercial  outcomes,  long-term  contribution  to 
technology,  influence  on  the  competitive  status  of  Dutch  industry 
and  the  importance  of  patents  in  the  field. 

From  the  comments  received,  the  program  officer  at  STW 
compiles  a  document  in  which  the  comments  are  sorted  according  to 
sub-criteria.  This  document  is  then  sent  to  the  principal 
investigator  who  is  allowed  to  reply  to  each  comment;  the 
investigator's  actual  words  are  then  typed  in  italics  directly 
under  each  comment.  The  complete  document,  called  a  protocol, 
provides  information  for  and  against  the  proposal.  When  the 
protocols  for  20  proposals  (regardless  of  the  topics  concerned)  are 
ready,  a  jury  is  formed  consisting  of  12  highly  qualified  persons 
coming  from  universities,  government  laboratories  and  industry. 
Their  disciplines  and  backgrounds  vary  widely.  No  jury  member 
knows  who  else  is  on  the  jury;  names  are  not  divulged.  The  work  is 
done  free  of  charge  and  each  member  of  the  jury  is  only  allowed  to 
participate  once:  the  next  20  proposals  are  handled  by  a  new  jury. 

The  STW  board  gives  a  grant  to  at  least  the  best  8  proposals. 
This  minimum  grant  percentage  of  40  per  cent  is  never  influenced  by 
resource  allocations.  If  STW  resources  were  to  become  insufficient 
to  operate  this  system,  STW  would  stop  accepting  proposals  for  a 
while. 

According  to  its  proponents,  this  procedure  has  proved  to  be 
reproducible,  and  in  the  Netherlands  it  is  widely  accepted. 
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Because  the  system  is  reproducible  and  objective,  STW  gets  hardly 
any  resubmissions.  A  proposal  resubmitted  to  STW  will  be  almost 
certain  to  receive  the  same  assessment  as  the  original  proposal. 
A  notable  feature  of  the  procedure  is  that  it  is  very  dynamic:  for 
instance,  there  are  no  fixed  groups  of  influential  people  within 
STW.  Every  year  about  50  per  cent  of  the  peers  are  new.  Jury 
members  serve  only  once.  The  STW  board  does  not  set  additional 
priorities  once  the  priority  rating  has  been  established  by  the 
external  assessors. 

Opinions  on  the  quality  of  the  proposed  research  can  differ 
considerably.  STW  has  performed  many  studies  to  ascertain  whether 
the  STW  process  really  works.  They  have  checked  the 
reproducability  of  the  jury  judgement.  The  have  also  checked  that 
their  procedure  does  not  discriminate  with  regard  to  age  or  budget. 
Their  evaluation  of  the  research  results  10  years  after  the 
proposal  was  granted  shows  that  there  is  a  correlation  between  the 
outcomes  and  the  jury's  assessment  of  the  utilization  potential. 
Furthermore,  their  jury  system  ensures  that  original  proposals 
receive  grants,  which  would  not  be  the  case  if  STW  had  relied 
solely  on  bibliometric  indicators  (see  van  den  Beemt  &  van  Raan 
1995)  . 

After  a  proposal  has  been  granted,  STW  immediately  forms  a 
users'  committee  for  that  particular  research  project.  The 
committee  meets  twice  a  year  at  the  university  where  the  research 
is  taking  place.  The  research  team  gives  an  overview  of  their 
work,  and  discusses  this  with  the  'users'.  The  'users'  are  mainly 
experts,  but  sometimes  they  are  managers  and/or,  if  appropriate, 
government  representatives.  STW  regards  this  as  an  effective 
partnership.  Most  funding-agencies  (after  granting  a  project) 
neglect  this  aspect  of  the  process  and  ask  only  for  annual  reports 
on  the  granted  research  project  or  they  visit  the  groups  once  every 
two  years.  STW,  on  the  other  hand,  constantly  involves  the 
potential  users  from  society  as  the  research  progresses.  They 
evaluate  the  projects  one  year  and  six  years  after  the  project  has 
ended . 

STW  concludes  that  Peer  Review  can  be  relevant  when  it 
involves  more  than  5  peers  and  they  are  asked  only  for  their 
comments.  The  comments  of  peers  need  to  be  assessed  by  a  number  of 
highly  qualified  people  (non-peers) .  STW  believes  that  the  people 
involved  in  the  peer  and  jury  procedures  must  not  meet  and  must 
work  by  mail.  STW  believes  that  it  is  not  a  good  idea  to  work  with 
fixed  groups  of  peers  and  jury  members.  STW  also  believes  that 
bibliometric  indicators  have  nothing  to  do  with  scientific  quality; 
they  simply  indicate  numbers  of  publications  and  citations.  They 
should  not  be  used  for  the  assessment  of  research  proposals. 

PEER  REVIEW  PRACTICES;  EXISTING  PROGRAMS 

There  are  many  approaches  used  by  research  sponsoring 
organizations  to  conduct  periodic  peer  reviews  to  monitor  the 
quality  and  potential  impact  of  ongoing  research  [Salasin,  1980; 
Logsdon,  1985;  DOE,  1993;  Kostoff,  1995a;  Ormala,  1989;  Cozzens, 
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1987;  Kerpelman,  1985;  Luukkonen-Grunow ,  1987;  OTA,  1986].  This 
section  focuses  on  selected  peer  review  approaches  which  reflect 
the  state  of  the  art  in  the  technical  community  and  pays  special 
emphasis  to  how  research  impact  is  incorporated  into  the  peer 
review  process.  The  first  case  study  is  the  DOE  review  of  its 
Office  of  Basic  Energy  Sciences  (BES) ,  and  the  evolution  of  that 
approach  into  present  DOE  practice.  The  second  case  study  focuses 
on  the  ONR  methods  used  to  review  extramural  and  intramural 
programs.  The  third  and  fourth  case  studies  relate  to  the  annual 
reviews  of  the  National  Institute  of  Standards  and  Technology 
(NIST)  and  the  Army  Research  Laboratory  (ARL)  by  the  National 
Academy  of  Sciences  (NAS) ,  and  the  fifth  case  study  relates  to  the 
annual  review  of  the  DOE  national  laboratories  by  the  field 
offices.  The  final  case  study  describes  an  approach  used  by  the 
author  to  evaluate  a  program  of  small  high-risk  seed  money 
projects. 

In  1981,  the  DOE  performed  an  assessment  of  existing  projects 
funded  by  its  office  of  Basic  Energy  Sciences  [DOE,  1982;  Kostoff, 
1988].  Out  of  approximately  1200  active  projects  supported  by  BES, 
a  randomly  selected  sample  of  129  projects  was  reviewed  by  panels 
of  scientific  peers.  The  projects  were  grouped  by  areas  of  similar 
science,  and  the  reviews  were  conducted  on  40  separate  days  by  40 
separate  expert  panels,  with  an  average  of  four  members  and  three 
projects  per  panel.  The  reviewers  were,  for  the  most  part,  bench 
level  scientists  independent  of  the  DOE. 

The  reviewers  were  asked  to  rate  seven  factors  for  each 
project: 


1.  Team  Quality  (TQ) ; 

2.  Scientific  Merit  (SM) ; 

3.  Scientific  Approach  (SA) ; 

4.  Productivity  (P) ; 

5.  Importance  to  Mission  (IM) ; 

6.  Energy  Impact  (El); 

7.  Overall  Project  Quality  (OPQ) . 

The  three  evaluation  factors  on  the  scoresheet  which  related 
to  potential  research  impact  were  SM,  IM,  and  El.  SM  incorporated 
the  potential  impact  of  the  research  on  allied  research  fields.  IM 
covered  the  types  of  ways  in  which  a  research  project  could 
contribute  to  the  Nation's  energy  needs.  El  was  the  probable 
impact  of  the  research  project  on  energy  development,  conservation, 
or  use. 

After  the  scoring  by  the  panels  was  completed,  all  possible 
linear  regression  models  (ranging  from  six-factors  to  one-factor) 
were  used  to  relate  the  OPQ  rating  factor  (essentially  the 
reviewers'  bottom  line  score  on  each  project)  to  the  other  rating 
factors  for  the  129  projects.  The  six-factor  model  produced  a 
correlation  coefficient  of  0.89,  which  meant  that  the  six-factors 
selected  constituted  the  bulk  of  the  considerations  which  the 
reviewers  used  to  score  the  OPQ  rating  factor.  In  fact,  the  best 
three-factor  model  derived  to  predict  the  OPQ  rating  factor  score. 
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consisting  of  TQ,  SA,  and  IM,  produced  correlation  coefficients 
within  three  percent  of  the  complete  six-factor  model  [DOE,  1982], 

An  updated  version  of  the  BES  evaluation  approacl^  is  used  by 
the  DOE  Office  of  Program  Analysis  to  conduct  peer  review 
assessments  of  DOE  research  and  development  [DOE,  1993].  Now, 
after  a  panel  has  completed  the  evaluation  of  all  the  projects 
assigned  to  it,  the  members  are  asked  to  identify  research  needs  or 
opportunities  available  to  the  DOE  research  program.  Since  the 
panel  members  are  very  familiar  with  the  program  strengths  and 
weaknesses  at  this  point  in  the  review,  the  opportunities  and  needs 
that  they  identify  should  be  viewed  as  highly  relevant  and 
credible. 

Each  of  ONR's  review  processes  has  a  major  peer  evaluation 
component  adapted  to  meet  the  particular  needs  of  the 
organizational  unit  under  review.  The  two  reviews  described  here 
are  those  of  ONR's  two  largest  research  claimants  circa  1992,  the 
Research  Programs  Department  (RPD)  and  the  Naval  Research 
Laboratory  (NRL) . 

The  RPD  sponsored  extramural  basic  research  mainly  at 
universities,  and  consisted  of  13  Divisions  organized  along  science 
disciplines.  Two  separate  groups  contributed  to  the  one  day  annual 
review  of  each  Division.  One  group  was  the  Division's  Board  of 
Visitors  (BOV) ,  which  represented  academia,  industry,  and  non-ONR 
government.  The  majority  of  the  BOV  were  members  of  the  research 
community,  but  typically  the  BOV  would  include  representatives  from 
the  technology  development  community  and  the  operational  Navy.  The 
other  group  contributing  to  the  review  was  the  Research  Advisory 
Board,  the  senior  management  of  the  RPD  whose  backgrounds  spanned 
a  wide  range  of  scientific  disciplines. 

For  the  review,  the  Division  Director  overviewed  the  total 
Division,  including  programs,  accomplishments,  new  opportunities, 
and  management  issues.  The  Division's  program  managers  described 
their  programs  in  detail,  including  the  impact  on  science  of  their 
accomplishments,  potential  or  ongoing  transitions  of  their  programs 
to  development  programs,  some  bibliometric  measures  such  as 
publications,  and  potential  impacts  on  the  Navy  if  successful.  The 
reviewers  filled  out  comment  sheets,  focusing  on  Scientific  Merit, 
Technical  Approach,  and  Potential  Naval  Impact,  and  later  discussed 
their  findings  with  the  RPD  management. 

Almost  all  of  the  NRL's  programs  are  intramural,  and  it 
conducts  full  spectmm  research  in  60  task  areas.  On  average, 
about  20  task  areas  will  be  reviewed  per  year,  with  4  or  5  of  these 
task  areas  reviewed  using  external  reviewers,  and  the  remainder 
reviewed  by  an  internal  NRL  management  group  called  the  Research 
Advisory  Committee  (RAC) .  The  external  review  group  represents 
academia,  industry,  and  non-NRL  government.  The  RAC  consists  of 
NRL  senior  management  whose  backgrounds  span  a  broad  range  of 
science  disciplines. 

The  Coordinator  of  the  task  area  reviewed  by  the  external 
panel  overviews  the  task  area  and  investment  strategy.  Then,  the 
principal  investigators  of  the  task  area  describe  their  work  in 
detail,  including  the  impact  of  their  science  accomplishments  on 
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the  task  area  and  allied  science  fields,  transitions  to  more 
applied  categories,  bibliometric  measures  such  as  publications  and 
presentations,  and  potential  impact  of  their  research  on  the  Navy. 

The  reviewers  fill  out  comment  sheets,  focusing  on  Scientific 
Merit,  Technical  Approach,  and  Potential  Naval  Impact,  and 
afterward  visit  and  review  facilities.  The  reviewers  draft  a 
report  and  meet  with  ONR  management  and  members  of  the  RAC  to 
present  their  preliminary  findings.  The  remaining  task  areas  are 
reviewed  in  detail  by  the  RAC. 

NIST  is  reviewed  annually  by  two  external  groups,  a  general 
policy  and  management  review,  and  a  detailed  technical  review.  The 
Visiting  Committee  on  Advanced  Technology  reviews  general  policy, 
organization,  budget,  and  programs  of  NIST.  The  Committee  svibmits 
an  annual  report  [NIST,  1991a]  which  includes  reviews  of  progress 
in  NIST's  science,  engineering  and  technology  transfer  programs. 

The  National  Academy  of  Sciences'  (NAS)  Board  on  Assessment  of 
NIST  Programs  performs  a  detailed  technical  review  [NIST,  1991b] . 
Seventeen  panels  of  reviewers  (about  ten  people  per  panel)  from 
industry  and  academia  conduct  program  reviews  based  on  2  or  3 -day 
site  visits  at  NIST  facilities.  The  panels  address  variants  of 
research  quality,  and  because  of  NIST's  unique  charter  in 
supporting  competitiveness,  pay  particular  attention  to  technology 
transfer,  industrial  coupling,  and  emerging  technologies.  While 
quantitative  indicators  of  research  impact  are  not  addressed  in  the 
panels'  annual  reports  [NIST,  1991b],  impacts  of  the  research  on 
technology  and  competitiveness  are  addressed  extensively. 
Recommendations  for  improvement  in  these  impact  areas  are  provided. 

Recently,  the  ARL  contracted  with  the  NAS  to  establish  a 
Technical  Assessment  Board  (TAB)  and  associated  review  panels  for 
the  purposes  of  evaluating  the  quality  of  the  ongoing  research, 
assessing  of  the  state  of  the  laboratory's  facilities,  and 
appraising  the  level  of  preparedness  and  functioning  of  the 
technical  staff.  The  TAB  has  15  members  with  expertise  in  fields 
aligned  with  ARL's  six  business  areas  (Vehicle  Technologies, 
Weapons  and  Materials  Research,  Information  Science  and  Technology, 
Sensors  and  Electronic  Devices,  Human  Research  and  Engineering, 
Survivability  and  Lethality  Analysis) ,  and  its  members  come  mainly 
from  Academia  and  Industry.  The  NAS  established  six  review  panels 
(one  for  each  business  area)  ,  each  one  consisting  of  about  ten 
members  including  some  TAB  members.  Each  panel  reviews  one  third 
of  the  program  in  its  business  unit  area  per  year;  each  full 
business  unit  is  therefore  reviewed  on  a  three  year  cycle.  Each 
review  consisted  of  a  two  day  site  visit  by  the  panel.  The  review 
included  briefings  on  technical  projects,  touring  the  lab  to  assess 
the  facilities  and  equipment,  interacting  personally  with  the 
research  staff,  and  reviewing  those  portions  of  the  ARL  extended 
program  being  conducted  with  private  sector  partners  under  a 
Cooperative  Agreement  (Federated  Laboratory;  in  essence,  the 
addition  of  virtual  lab  divisions) .  An  annual  report  contains  the 
review  results  [Brown,  1997]. 

The  DOE  has  nine  contractor-operated  multiprogram 
laboratories.  Each  contractor's  laboratory  management  performance 
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is  evaluated  annually  by  the  DOE  Field  Office  (FO)  to  which  each 
laboratory  is  assigned  [DOE,  1988].  The  FO  prepares  an  appraisal 
plan  for  the  laboratory,  which  focuses  on  laboratory  performance  in 
four  areas: 

1.  Institutional  Management  Performance,  which  includes 
different  aspects  of  overall  lab  management; 

2.  Programmatic  Performance,  which  includes  R&D  achievements; 

3.  Operations  Support  Performance,  which  includes  technical 
functions  which  support  mission  objectives; 

4.  Administrative  Performance,  which  includes  business 
management  functions. 

In  the  programmatic  performance  areas,  sources  of  input 
include  DOE  program  officials,  other  agencies  having  substantial 
work  at  the  laboratory,  and  FO  program  managers.  For  this  annual 
review,  DOE  will  utilize  information  from  its  own  program  advisory 
committees  on  the  adequacy  and  impact  of  the  laboratory's  R&D 
efforts  in  relation  to  the  overall  DOE  program.  Furthermore,  DOE 
will  use  the  reports  of  the  scientific  peer  review  committees 
established  by  the  contractor,  which  provide  an  assessment  of  the 
quality  of  the  laboratory's  R&D  programs. 

There  appears  to  be  no  formal  requirement  for  using  teams  of 
external  reviewers  for  the  technical  programs  as  in  the  ONR  and 
NIST  reviews;  rather,  most  input  seems  to  come  from  the  sponsors. 
Estimations  of  research  impact  appear  to  derive  from  the  DOE 
program  advisory  coiamittees  and  peer  review  assessments,  which  may 
be  reflected  in  the  annual  appraisal. 

In  Europe,  panel  reviews  have  evolved  where  users  of  the 
research  results  together  with  scientific  peers  assess  the  impact 
of  the  research  on  scientific  progress  and  industrial  or  social 
development.  Another  development  line  has  been  to  commission 
evaluation  experts  either  to  support  panels  or  to  conduct 
independent  assessments  which  may  involve  surveys,  in-depth 
interviews,  case  studies,  etc  [Ormala,  1994].  A  1992  publication 
[Barker,  1992]  describes  how  evaluation  experts  coming  from  two 
main  communities  (civil  servants  and  academic  policy  researchers) 
interact  in  evaluation  of  R&D  in  the  UK.  The  performance  of 
evaluations,  including  the  synthesis  of  evidence  and  the  production 
of  conclusions  and  recommendations,  is  done  by  professionals,  as 
opposed  to  panels  of  eminent  persons.  No  comparisons  of  reviews  by 
the  professionals  with  those  of  eminent  persons  are  presented. 

IV-A-4.  PEER  REVIEW  PROTOCOLS 

The  previous  parts  of  this  section  have  focused  on  concepts, 
principles,  and  issues  related  to  research  program  peer  review,  as 
well  as  examples  of  selected  federal  agency  peer  review  practices. 
The  remainder  of  this  section  incorporates  many  of  these  ideas  into 
a  sample  program  peer  review  process.  Sufficient  detail  is 
presented  such  that  an  organization  could  use  this  as  a  guide  to 
developing  a  review  process  most  appropriate  to  its  needs.  Most  of 
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the  procedures  and  concepts  described  have  been  tested  and  found  to 
produce  very  useful  results. 

Program  Review  Options 

The  guiding  principle  for  review  options  is  that  evaluation 
should  occur  along  the  same  structures  and  taxonomies  by  which  the 
research  is  planned  and  executed.  If  the  agency  has  a  separate 
research  unit,  then  the  discipline  should  be  evaluated  as  an 
integrated  whole.  In  the  nominal  intra-agency  review,  quality  and 
relevance  could  be  evaluated  concurrently  or  separately,  as  desired 
by  the  agency. 

If  research  is  vertically  integrated  with  development,  then 
the  research  could  be  evaluated  as  part  of  a  total  vertical 
structure  R&D  review  [Kostoff,  1995a]  or  as  part  of  the  discipline, 
as  desired  by  the  agency.  In  the  nominal  intra-agency  review, 
quality  and  relevance  could  be  evaluated  separately  or 
concurrently.  A  key  conclusion  to  be  drawn  from  this  paragraph  is 
that  research  evaluation  recommendations  must  take  into  account  how 
research  is  structured,  integrated,  and  managed  within  an  agency. 

Desirable  characteristics  of  a  high  quality  peer  review  were 
listed  previously  under  the  Objectives  section.  The  research 
programs  should  be  reviewed  on  a  triennial  cycle,  based  on  the  DOE 
BES  evaluation  results  of  1982  [DOE,  1982],  and  on  other  agency 
practices. 

The  following  considerations  apply  to  a  concurrent  quality  and 
relevance  review.  The  reviewers  should  be  external,  have  minimal 
conflicts  with  the  program  being  reviewed,  and  should  be  selected 
with  expertise  in  all  facets  of  the  research  and  potential  impact 
areas.  To  evaluate  the  degree  of  horizontal  coupling  in  the 
nominal  intra-agency  review,  representatives  of  other  Federal 
agencies  should  be  considered  as  reviewers,  or  at  least  should,  be 
invited  to  participate  as  audience  members.  Thus,  the  review  panel 
will  be  a  heterogeneous  mixture  of  research  and  relevance  experts 
who  can  address  the  many  facets  of  the  science  and  areas  of 
potential  impact. 

In  the  nominal  concurrent  quality  and  relevance  review, 
quality  and  relevance  should  be  the  main  review  criteria.  Research 
quality  criteria  should  include  research  merit,  research  approach, 
productivity,  and  team  quality.  Relevance  criteria  should  include 
short  term  impact  (transitions  and/or  utility) ,  long  term  potential 
impact,  and  some  estimate  of  the  probability  of  success  of 
attaining  each  type  of  impact. 

There  should  be  an  overview  showing  how  the  larger  management 
unit  (Division,  Department,  etc.)  in  which  the  programs  are  housed 
integrates  into  the  total  organization,  and  how  the  management 
unit's  objectives  relate  to  those  of  the  larger  organization. 
Then,  the  investment  strategy  of  the  larger  management  unit  should 
be  presented  in  detail.  This  would  include  the  relative  program 
priorities,  the  actual  investment  allocation  to  the  different 
programs,  and  the  rationale  for  the  investment  allocation. 
Finally,  for  each  program  presentation,  the  investment  strategy  for 
its  thrust  areas  should  be  presented. 
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The  investment  strategy  is  perhaps  the  most  crucial  part  of  a 
program  review,  and  deserves  further  discussion  here.  While 
investment  is  the  allocation  of  resources  among  the  program 
components,  the  investment  strategy  is  the  rationale  for  the 
prioritization  and  allocation  of  resources  among  the  program 
components.  The  optimal  investment  strategy  for  a  program,  which 
should  be  a  focal  point  of  an  assessment,  is  that  allocation  and 
rationale  which  will  produce  the  most  mission  relevant  high  quality 
research  for  impacting  the  program's  objectives.  This  will  depend 
on  the  viewpoint  of  the  assessor,  and  in  particular  how  the 
assessor  limits  the  role  of  the  research  within  the  national 
perspective. 

The  optimal  investment  strategy  results  from  a  timely 
confluence  of  research  requirements  (top-down  driven)  and  promising 
research  opportunities  (bottom-up  driven) .  Further,  promising 
research  opportunities  result  from  a  timely  confluence  of  advances 
in  theory,  instrumentation,  new  experiments,  new  algorithms,  and 
computers.  Finally,  research  requirements  result  from  a  timely 
confluence  of  domestic  and  foreign,  political  and  economic, 
strategic  and  tactical  advances.  All  of  the  above  factors  should 
be  included  in  a  presentation  of  the  investment  strategy. 

While  the  emphasis  is  on  peer  review,  bibliometric  and  other 
type  of  indicators  should  be  utilized.  Also,  it  is  recommended 
strongly  that  sufficient  background  material  be  supplied  to  the 
reviewers  before  the  review.  This  would  include  organizational 
descriptive  material,  narrative  descriptions  of  each  program  to  be 
reviewed,  and  descriptive  material  of  each  work  unit  in  the 
program.  It  would  also  prove  useful  to  include  bibliometric  output 
indicators  for  each  program,  with  interpretive  analytical  material. 
This  could  include  refereed  papers,  patents,  awards  and  honors, 
presentations,  etc.  It  would  be  useful  to  include  narrative 
material  on  related  programs  in  other  agencies  and  industry.  It 
would  be  useful  to  include  Hindsight-type  results  of  research  that 
was  funded  years  ago  in  the  discipline  under  review  and  which 
recently  came  to  firuition  in  a  system  or  commercial  technology. 
Finally,  although  the  following  concept  has  never  been  tested  to 
the  author's  knowledge,  it  would  be  valuable  to  incorporate  the 
results  of  journal  manuscript  reviews  in  the  research  program  peer 
review  process. 

The  best  features  of  different  organizations'  peer  review 
practices  can  be  combined  with  some  of  the  principles  above  into  a 
protocol  for  the  conduct  of  successful  peer  review  research  program 
evaluations  and  impact  assessments.  The  main  aims  of  the  protocol 
are  to  insure  that  the  final  assessment  product  has  the  highest 
intrinsic  quality  and  that  the  assessment  process  and  product  are 
perceived  as  having  the  highest  possible  credibility.  The  protocol 
elements  are: 

1.  The  objectives  of  the  assessment  must  be  stated  clearly  and 
unambiguously  at  the  initiation  of  the  assessment  by  the  highest 
levels  of  management,  and  the  full  support  of  top  management  must 
be  given  to  the  assessment.  In  turn,  the  objectives,  importance, 
and  urgency  of  the  assessment  must  be  articulated  and  communicated 
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down  the  management  hierarchy  to  the  managers  and  perforaers  whose 
research  is  to  be  assessed,  and  the  cooperation  of  these  reviewees 
must  be  enlisted  at  the  earliest  stages  of  the  assessment; 

2.  The  final  assessment  product,  the  audience  for  the  product, 
and  the  use  to  be  made  of  the  product  by  the  audience  should  be 
considered  carefully  in  the  design  of  the  assessment; 

3 .  One  person  should  be  assigned  to  manage  the  assessment  at 
the  earliest  stage,  and  this  person  should  be  given  full  authority 
and  responsibility  for  the  assessment; 

4 .  The  assessment  manager  should  report  to  the  highest 
organizational  level  possible  in  order  to  insure  maximum 
independence  from  the  research  units  being  assessed; 

5.  The  reviewers  should  be  selected  to  represent  a  wide 
variety  of  viewpoints,  in  order  to  address  the  many  different 
facets  of  research  and  its  impact  [Kostoff,  1988].  These  would 
include  bench-level  researchers  to  address  the  impact  of  the 
proposed  research  on  the  field  itself;  broad  research  managers  to 
address  potential  impact  on  allied  research  fields;  technologists 
to  address  potential  impact  on  technology  and  the  potential  of  the 
research  to  transition  to  higher  levels  of  development;  systems 
specialists  to  address  potential  impact  on  systems  and  hardware; 
and  operational  personnel  to  address  the  potential  impact  on 
downstream  organizational  operations.  The  reviewers  should  be 
independent  of  the  research  units  being  evaluated,  and  independent 
of  the  assessing  organization  where  possible.  The  objectives  of, 
and  constraints  on  (if  any) ,  the  assessment  should  be  communicated 
to  the  reviewers  at  the  initial  contact; 

6.  Maximum  background  material  describing  the  research  to  be 
assessed,  related  research  and  technology  development  sponsored  by 
external  organizations,  the  organization  structure,  and  other 
factors  pertinent  to  the  assessment,  should  be  provided  to  the 
reviewers  as  early  as  possible  before  the  review.  This  will  allow 
the  reviewers  and  presenters  to  use  their  time  most  productively 
during  the  review; 

7.  Recommendations  resulting  from  the  assessment  should  be 
tracked  to  insure  that  they  are  considered  and  implemented,  where 
appropriate.  For  research  programs,  planning,  execution,  and 
review  are  linked  intimately.  Feedback  from  the  review  outcomes  to 
planning  for  the  next  cycle  should  be  tracked  to  insure  that  the 
review/planning  coupling  is  operable. 

Evaluations  should  be  performed  at  three  levels  of  resolution 
in  the  organization: 

1.  The  highest  level  would  be  an  annual  corporate  level  review 
of  how  the  organization  performs  research.  If  the  organization  has 
a  separate  research  unit,  then  the  unit  should  be  evaluated  as  an 
integrated  whole.  If  research  is  vertically  integrated  with 
development,  then  the  research  should  preferably  be  evaluated  as 
part  of  a  total  organization  R&D  review.  The  charter  of  this 
highest  level  assessment  would  be  to  review,  at  the  corporate 
level,  general  policy,  organization,  budget,  and  programs  (e.g., 
NIST,  [1991]).  Total  inputs  and  outputs,  including  integrated 
bibliometric  indicators,  would  be  examined.  Overall  research 
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management  processes  would  be  examined,  such  as  selection, 
execution,  review,  and  technology  transfer  of  research.  The 
overall  investment  strategy  would  be  evaluated,  and  would  include 
different  perspectives  of  the  program,  such  as  technical 
discipline,  performer,  and  end  use  allocation.  The  integration  of 
the  research  objectives  with  the  larger  organization  objectives 
would  be  assessed.  The  evaluators  would  include,  but  not  be 
limited  to,  representatives  of  the  stakeholder,  customer,  and  user 
community  whose  potential  conflicts  with  the  organization  are 
minimal . 

2 .  The  second  level  would  be  trienniel  peer  review  of  a 
discipline  or  management  unit  at  the  program  level  (e.g.,  Kostoff, 
[1988,  1995a]),  where  a  program  is  defined  as  an  aggregation  of 
work  units  (Principal  Investigators) .  If  the  organization  has  a 
separate  research  unit,  then  the  discipline  should  be  evaluated  as 
an  integrated  whole.  In  the  nominal  review,  quality  and  relevance 
could  be  evaluated  concurrently.  If  research  is  vertically 
integrated  with  development,  then  the  research  should  preferably  be 
evaluated  as  part  of  a  total  vertical  structure  R&D  review.  In  the 
nominal  vertical  structure  review,  quality  and  relevance  should 
preferably  be  evaluated  separately.  Thus,  research  evaluation  must 
take  into  account  how  research  is  structured,  integrated,  and 
managed  within  an  organization.  Research  quality  criteria  should 
include  research  merit,  research  approach,  productivity,  and  team 
quality.  Relevance  criteria  should  include  short  term  impact 
(transitions  and/or  utility) ,  long  term  potential  impact,  and  some 
estimate  of  the  probability  of  success  of  attaining  each  type  of 
impact.  While  the  emphasis  is  on  peer  review,  bibliometric  and 
other  type  of  indicators  should  be  utilized  to  supplement  the  peer 
evaluation. 

3 .  The  third  level  would  be  a  minimum  of  trienniel  peer  review 
at  the  work  unit  (Principal  Investigator)  level  (e.g.,  DOE, 
[1993]).  Most  of  the  program  level  issues  described  above  are 
applicable  and  need  not  be  repeated  here. 

-For  each  of  these  three  levels  of  review,  the  following  criteria 
and  issues  should  be  considered  during  the  review  as  appropriate. 

The  following  criteria  and  issues  should  be  considered  during 
the  review  as  appropriate. 

1.  Quality  and  uniqueness  of  the  work 

2.  Scientific  and  technological  opportunities  in  areas  of 
likely  organization  mission  importance 

3.  Need  to  establish  a  balance  between  revolutionary  and 
evolutionary  work 

4.  Position  of  the  work  relative  to  the  forefront  of  other 
efforts 

5.  Responsiveness  to  present  and  future  organization  mission 
requirements 

6.  Possibilities  of  follow-on  programs  in  higher  R&D 
categories 

7.  Appropriateness  of  the  efforts  for  organization  as  opposed 
to  other  organizations 
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8.  Coordination  with  related  work  in  other  organizations 

The  following  questions  should  be  asked  of  organization 
programs : 

1.  What  is  the  investment  strategy  of  the  larger  management 
unit.  This  would  include  the  relative  program  priorities,  the 
actual  investment  allocation  to  the  different  programs,  and  the 
rationale  for  the  investment  allocation.  For  each  program  being 
reviewed,  what  is  the  investment  strategy  for  its  thrust  areas. 

2.  What  are  we  trying  to  do  (in  a  systems  concept)? 

3.  Can  specific  advantage  to  the  organization  be  identified  if 
program  is  successful? 

4.  How  is  the  system  done  today  and  what  are  the  limitations 
of  the  current  practice? 

5.  Would  the  work  be  supported  if  it  were  not  already 
underway? 

6.  Assuming  success,  what  difference  does  it  make  to  the  user 
in  a  mission  area  content? 

7.  What  is  the  technical  content  of  the  program  and  how  does 
it  fit  with  other  ongoing  efforts  in  academia,  industry, 
organization  labs,  other  labs,  etc.? 

8.  What  are  the  decision  milestones  of  the  program? 

9.  How  long  will  the  program  take;  how  much  will  the  program 
cost;  what  are  the  mid-term  and  final  objectives  of  the  program? 

PEER  REVIEW  -  SXJMMARY  AND  CONCLUSIONS 

Peer  review  is  the  most  widely  used  and  generally  credible 
method  used  to  assess  the  impact  of  research.  Much  of  the 
criticism  of  peer  review  has  arisen  from  misunderstandings  of  its 
accuracy  resolution  as  a  measuring  instrument.  While  a  peer  review 
can  gain  consensus  on  the  projects  and  proposals  that  are  either 
outstanding  or  poor,  there  will  be  differences  of  opinion  on  the 
projects  and  proposals  that  cover  the  much  wider  middle  range.  For 
projects  or  proposals  in  this  middle  range,  their  fate  is  somewhat 
more  sensitive  to  the  reviewers  selected.  If  a  key  purpose  of  a 
peer  review  is  to  insure  that  the  outstanding  projects  and 
proposals  are  funded  or  continued,  and  the  poor  projects  are  either 
terminated  or  modified  strongly,  then  the  capabilities  of  the  peer 
review  instrument  are  well  matched  to  its  requirements. 

However,  the  value  of  peer  review  as  a  predictive  tool  for 
assessing  the  impact  of  research  on  an  organization's  mission 
(other  than  research  for  its  own  sake)  rests  on  faith  more  than  on 
hard  documented  evidence.  Also,  for  serious  panel-type  peer 
reviews  or  mail-type  peer  reviews,  where  sufficient  expertise  is 
represented  on  the  panels,  total  real  costs  will  dominate  direct 
costs.  The  major  contributor  to  total  costs  is  the  time  of  all  the 
players  involved  in  executing  the  review.  With  high  quality 
performers  and  reviewers,  time  costs  are  high,  and  the  total  review 
costs  can  be  a  non-negligible  fraction  of  total  program  costs, 
especially  for  programs  that  are  people  intensive  rather  than 
hardware  intensive. 

The  methods  that  were  described  include  criteria  which  address 
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the  impact  of  research  on  its  own  and  allied  fields,  as  well  as  on 
the  mission  of  the  sponsoring  organization.  The  most  intensive  use 
of  peer  review  appears  to  be  the  NSF/NIH  processes  for  assessing 
proposals,  the  DOE  review  of  the  BES  program  at  the  principal 
investigator  level,  and  the  NAS  annual  review  of  NIST.  Nearer-term 
research  impacts  typically  play  a  more  important  role  in  the  review 
outcome  than  longer-term  impacts,  but  do  not  have  quite  the 
importance  of  team  quality,  research  approach,  or  the  research 
merit.  A  minimal  set  of  review  criteria  should  include  team 
quality,  research  merit,  research  approach,  and  a  criterion  related 
to  longer-term  relevance  to  the  organization's  mission.  More 
important  than  the  criteria  is  the  dedication  of  an  organization's 
management  to  the  highest  quality  objective  review,  and  the 
associated  emplacement  of  rewards  and  incentives  to  encourage 
quality  reviews. 

IV-B.  SEMI -QUANTITATIVE  METHODS 

BACKGROUND  AND  OVERVIEW 

In  the  evaluation  of  research  impact,  a  spectrvim  of  approaches 
may  be  considered  [Wirt,  1974;  Salasin,  1980;  Logsdon,  1985; 
Kerpelman,  1985;  OTA,  1986;  Luukkonen-Gronow,  1987;  Averch,  1990; 
Hall,  1990;  Johnston,  1990;  OTA,  1991;  Kostoff,  1992a,  1993b].  At 
one  end  of  the  spectrum  are  the  subjective,  essentially  non- 
quantitative  approaches,  of  which  peer  review  is  the  prototype 
[Chubin,  1994,  1990;  Cozzens,  1987;  DOE,  1982,  1991;  Frazier,  1987; 
Johnston,  1990;  Kerpelman,  1985;  Kostoff,  1988;  Logsdon,  1985; 
Luukkonen-Gronow,  1987;  Ormala,  1989;  OTA,  1986;  Salasin,  1980]. 
At  the  other  end  of  the  spectmm  are  the  mainly  quantitative 
approaches,  such  as  evaluative  bibliometrics  and  cost-benefit 
[Carpenter,  1980;  King,  1987;  MacRoberts,  1989;  Mansfield,  1991; 
Miller,  1992;  Narin,  1976,  1987a,  1987b;  White,  1989].  In  between 
are  what  can  be  termed  semi-quantitative  approaches  [Kostoff, 
1992a,  1992b,  1993d,  1994d,  1994j]. 

These  semi-quantitative  methods  make  little  use  of 
mathematical  tools  but  draw  on  documented  approaches  and  results 
wherever  possible.  They  have  limited  credibility  in  the  analytic 
community,  since  the  selection  of  innovations  to  be  analyzed  tends 
to  be  arbitrary  rather  than  mathematically  rigorous,  and  they  are 
viewed  more  as  anecdotal  approaches  than  serious  technical 
approaches.  Nevertheless,  in  practice,  some  of  these  approaches 
(namely,  studies  of  accomplishments  resulting  from  sponsored 
research  programs,  or  studies  of  systems  and  the  research  products 
which  were  eventually  converted  and  incorporated  into  those 
systems)  are  widely  used  by  the  research  sponsoring  organizations. 

Three  types  of  semi-quantitative  methods  used  by  the  Federal 
government  in  RIA  are  presented.  These  include  the  classic 
retrospective  method  (Project  Hindsight) ,  another  retrospective 
approach  (Project  TRACES  and  follow-ons) ,  and  accomplishments  books 
used  by  selected  research  sponsoring  organizations  (Office  of  Naval 
Research,  Air  Force  Office  of  Scientific  Research,  Department  of 
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Energy  Office  of  Health  and  Environmental  Research,  Department  of 
Energy  High  Energy  Physics  Program,  Defense  Advanced  Research 
Projects  Agency) .  The  strengths  and  weaknesses  of  each  approach 
are  discussed.  One  goal  of  all  the  studies  presented  was  to 
identify  the  products  of  research  and  some  of  their  impacts.  In 
addition,  the  Hindsight,  TRACES,  and  ARPA  studies  tried  to  identify 
factors  which  influenced  the  productivity  and  impact  of  research. 
The  following  general  conclusions  about  the  role  and  impact  of 
basic  research  were  reached: 

1.  The  majority  of  basic  research  events  which  directly 
impacted  technologies  or  systems  were  non-mission  oriented  and 
occurred  many  decades  before  the  technology  or  system  emerged; 

2 .  The  cumulative  indirect  impacts  of  basic  research  were  not 
accounted  for  by  any  of  the  retrospective  approaches  published; 

3.  An  advanced  pool  of  knowledge  must  be  developed  in  many 
fields  before  synthesis  leading  to  an  innovation  can  occur; 

4.  Allocation  of  benefits  among  researchers,  organizations, 
and  funding  agencies  to  determine  economic  returns  from  basic 
research  is  very  difficult  and  arbitrary,  especially  at  the  micro 
level . 


PROJECT  HINDSIGHT 

Project  Hindsight  was  established  by  the  Defense  Department  in 
1965  to  identify  those  management  factors  important  in  assuring 
that  research  and  technology  programs  are  productive  and  that 
program  results  are  used.  It  also  attempted  to  measure  the  overall 
increase  in  cost-effectiveness  in  the  current  generation  of  weapons 
systems  compared  with  that  of  their  predecessors  assignable  to  any 
part  of  the  Defense  Department's  investment  in  research  and  science 
and  technology  [DOD,  1969]. 

The  approach  taken  in  Project  Hindsight  was  retrospective. 
Twenty  arbitrarily-selected  recent  weapons  systems  and  major 
military  equipments  were  analyzed  by  (mainly  DOD  in-house)  teams  of 
technical  specialists.  Their  task  was  to  identify  applications  of 
science  and  technology  that  were  not  utilized  in  predecessor 
military  systems  designed  to  meet  roughly  the  same  requirements. 
The  evolution  of  the  new  technology  represented  in  each  system  was 
traced  back  in  time  to  critical  points  called  "research  or 
exploratory  development  (RXD)  Events".  The  RXD  Event  was  the  basic 
quantifying  unit  in  the  study  and  was  defined  as  the  occurrence  of 
a  novel  idea  and  the  subsequent  scientific  and  engineering  activity 
in  which  the  idea  was  examined  or  tested.  There  could  be  one  or 
two  RXD  Events,  or  an  extended  chain  of  them,  culminating  in  a 
device  or  component  found  in  a  particular  system. 

The  teams  of  specialists  identified  710  unique  RXD  Events, 
conducted  the  historical  traces,  and  described  and  documented  the 
related  activities  in  terms  of  the  differential  amount  of  knowledge 
that  accounted  in  part  for  the  increased  cost-effectiveness  of  the 
systems  analyzed  (compared  with  their  predecessors) .  Project 
Hindsight  concentrated  only  on  the  post  World  War  II  contributions 
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of  science  and  technology  on  the  selected  systems.  Each  study  team 
was  allowed  about  three  months  to  complete  its  research  on  each 
system. 

In  treating  the  sciences.  Hindsight  distinguished  (1)  the 
basic  research  done  to  solve  a  specific  assigned  problem  from  (2) 
the  basic  research  done  to  expand  the  frontiers  of  scientific 
knowledge.  These  were  categorized  as  directed  and  undirected  basic 
research,  respectively.  It  was  found  that  RXD  Events  from  the 
directed  basic  research  category  emerged  in  systems  development 
approximately  nine  years  following  their  conception,  while  it  took 
twenty  or  more  years  for  some  events  from  the  undirected  category 
to  impact  development.  The  Hindsight  study  did  not  treat  in  any 
depth  the  contribution  from  undirected  basic  research,  since  many 
of  those  events  predated  the  time  span  of  the  project  [DOD,  1969] . 

Before  discussing  the  methodology  further,  some  of  the 
critical  findings  will  be  summarized.  The  identification  of  the 
RXDs  was  found  to  be  fairly  simple,  and  time  limitations  permitted 
only  a  fraction  to  be  uncovered  and  examined.  The  results  of 
research  in  science  were  most  frequently  exploited  when  the 
investigator  responded  to  recognized  needs  of  the  engineering 
community.  A  high  probability  of  utilization  involved  awareness  on 
the  part  of  the  scientist  concerning  who  in  the  engineering 
community  needed  the  knowledge,  and  on  the  part  of  the  interested 
engineers  as  to  which  specific  scientist  was  working  on  the 
problem. 

The  greatest  identified  payoff  in  terms  of  ideas  leading  to 
enhanced  weapons  systems  resulted  from  research  in  technology  -  and 
then,  where  the  research  scientist  or  engineer  was  intimately  aware 
of  problems  of  the  applications  engineer.  The  real  difference  in 
performance  between  a  weapon  system  and  its  predecessor  was  usually 
not  the  consequence  of  one,  two,  or  three  scientific  advances^  or 
technological  capabilities  but  was  the  synergistic  effect  of  100, 
200,  or  300  advances,  each  of  which  alone  was  relatively 
insignificant.  These  hundreds  of  diverse  advances  must  then  be 
fitted  and  adjusted  for  a  unified  operational  weapon  system.  The 
characteristics  of  each  advance  must  be  carefully  interfaced  with 
those  of  other  advances.  Project  Hindsight  data  showed  that 
systems  applications,  rather  than  new  science,  inspired  science  and 
technology  for  advanced  systems. 

While  criticisms  of  a  project  of  the  complexity  and  scope  of 
Hindsight  are  possible.  Hindsight  was  a  reasonable  first  step  in 
assessing  the  impact  of  applied  research  and  technology  development 
on  weapons  systems.  The  question  is  whether  the  Hindsight  approach 
and  conclusions  were  appropriate  for  evaluating  the  impact  of  basic 
research  on  weapons  systems,  or  whether  the  study  groundrules  and 
constraints  contained  built-in  biases  against  basic  research. 

The  most  obvious  limitation  of  Hindsight  relating  to  basic 
research  is  the  time  frame.  A  reading  of  the  Hindsight  report 
Appendices  shows  that  most  of  the  RXDs  occurred  in  the  1950s,  with 
few  in  the  '40s  and  '60s.  Since  many  fundamental  research  projects 
could  require  more  than  two  decades  for  their  results  to  impact 
systems  (especially  two  decades  ago  when  dissemination  of  results 
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did  not  have  the  benefit  of  today's  communication  channels  and 
systems)  ,  the  cut-off  on  time  span  could  have  precluded  the 
inclusion  of  research  impacts.  If  an  updated  Hindsight  study  were 
performed,  the  time  problem  could  be  alleviated  by  increasing  the 
retrospective  time  span  allowed.  Thus,  the  time  span  problem  is 
not  a  flaw  or  limitation  of  the  generic  retrospective  process,  but 
rather  is  associated  with  the  particular  Hindsight  implementation. 

A  more  serious  limitation  relates  to  the  RXD  approach.  The 
RXDs  are  identifiable  advances  which  draw  upon  the  pool  of 
technical  knowledge  in  existence  at  that  time.  But  the  pool  of 
knowledge  is  continually  increasing,  and  the  components  of  this 
pool  are  highly  interrelated,  both  directly  and  indirectly.  For 
example,  advances  in  basic  materials  understanding  may  be  dependent 
upon  advances  in  physics,  chemistry,  mathematics,  computer 
technology,  laser  technology,  computer  algorithms,  etc.  Some  of 
these  impacts  are  direct,  most  are  indirect. 

Thus,  any  RXD  could  theoretically  be  shown  to  be  impacted 
directly  or  indirectly  by  small  (or  in  some  cases  large)  advances 
in  the  component  basic  research  of  the  knowledge  pool.  While  the 
direct  or  indirect  impact  of  any  one  basic  research  component  on 
any  one  RXD  may  be  small  (if  it  were  large  and  within  the  time 
span,  it  would  have  been  identified  as  an  RXD) ,  the  total  direct 
and  indirect  impact  of  this  basic  research  component  on  all  the 
RXDs  may  not  be  small.  These  cumui a<:lve  Indirect  and  direct 
impacts  of  basic  research  are  not  accounted  for  bv  the  Hindsight 
methodolocrv.  and  in  fact  are  not  taken  into  account  bv  any  of  the 
retrospective  approaches  published  or  In  use  today.  A  study 
[Kostoff,  1991c.  1992a.  19941.  section  on  Network  Modeling  for 
Direct/Indirect  Impacts  later  In  Handbookl  which  examined  Impacts 
of  research  on  other  research  and  technology  through  direct  and 
indirect  paths  using  a  network  approach  showed  that  the  Indirect 
impacts  of  fundamental  research  can  be  very  large  in  a  ctunulative 
sense.  For  Hindsight,  the  indirect  impacts  would  have  been  even 
larger  if  the  actual  larger  number  of  RXDs  had  been  examined. 

The  Hindsight  conclusions  relative  to  the  impact  of  basic 
research  have  to  be  seen  in  perspective.  The  conclusion  to  be 
drawn  from  the  study  is  that  fundamental  research  had  little  direct 
impact  on  selected  weapons  systems  (whose  degree  of  design 
conservatism,  which  could  impact  implementation  speed  of 
revolutionary  concepts,  was  not  stated  or  evaluated)  in  a  time 
period  threshold  two  decades  before  weapon  system  implementation. 
Had  the  time  period  threshold  been  expanded,  and  indirect  impacts 
of  the  basic  research  been  incorporated  into  the  study,  then  a 
conclusion  could  have  been  drawn  about  the  total  impact  of  the 
basic  research  on  weapon  systems.  However,  had  the  question  about 
impact  been  raised  from  the  basic  research  component  viewpoint,  and 
an  appropriate  study  been  done  (of  which  Hindsight  would  have  been 
one  part) ,  then  conclusions  could  have  been  drawn  about  total 
impact  of  the  basic  research  component  on  all  technology  and 
systems,  of  which  the  Hindsight  weapons  systems  were  one  part. 

TRACES 
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The  Original  TRACES  Study 

In  1967,  The  National  Science  Foundation  (NSF)  instituted  a 
study  to  trace  retrospectively  key  events  which  had  led. to  a  number 
of  major  technological  innovations  (Technology  in  Retrospect  and 
Critical  Events  in  Science  -  TRACES) .  One  goal  was  to  provide  more 
specific  information  on  the  role  of  the  various  mechanisms, 
institutions,  and  types  of  R&D  activity  required  for  successful 
technological  innovation  [IITRI,  1968]. 

The  study  performers,  Illinois  Institute  of  Technology 
Research  Institute  (IITRI) ,  chose,  in  their  view,  a  representative 
cross  section  of  research  and  development  for  study  and  treated  all 
cases  uniformly.  The  five  innovations  selected  were:  Magnetic 
Ferrites,  Video  Tape  Recorder,  Oral  Contraceptive  Pill,  Electron 
Microscope,  and  Matrix  Isolation.  Key  'events'  in  the  research  and 
development  history  of  each  innovation  were  identified,  an  'event' 
being  defined  as  the  point  at  which  a  published  paper, 
presentation,  or  reference  to  the  research  was  made.  The  research 
and  development  activities  on  the  five  tracings  were  grouped  by 
category  of  research  (mission,  non-mission) ,  type  of  institution, 
date  of  event,  etc. ,  to  bring  out  some  of  the  factors  which  entered 
into  the  transition  from  non-mission  research  to  innovation. 

The  study  showed  that  non-mission  research  provided  the 
origins  from  which  science  and  technology  could  advance  toward 
innovations.  It  also  showed  that,  of  the  341  key  research  and 
development  events  judged  to  be  important  to  the  evaluation  of 
innovation,  approximately  70  percent  were  non-mission  research,  20 
percent  mission-oriented  research,  and  10  percent  development  and 
application.  The  number  of  non-mission  events  peaked  significantly 
between  the  twentieth  and  thirtieth  year  prior  to  an  innovation, 
while  mission-oriented  research  events  and  those  in  the  development 
and  application  area  peaked  during  the  decade  preceding  innovation. 
For  the  cases  studied,  the  average  time  from  conception  ”  to 
demonstration  of  an  innovation  was  nine  years. 

Ten  years  prior  to  an  innovation,  i.e.,  shortly  before 
conception,  approximately  90  percent  of  the  non-mission  research 
had  been  accomplished;  most  non-mission  research  appeared  completed 
prior  to  the  conception  of  the  innovation  to  which  it  would 
ultimately  contribute.  The  tracings  also  revealed  cases  in  which 
mission-oriented  research  or  development  efforts  elicited  later 
non-mission  research  which  often  was  found  to  be  crucial  to  the 
ultimate  innovation. 

There  are  a  number  of  interesting  comparisons  to  '  be  made 
between  TRACES  and  Hindsight.  First,  the  TRACES  time  frame  extends 
back  sufficiently  far  to  include  many  basic  research  results,  while 
the  Hindsight  time  span  was  able  to  include  most  development 
events,  but  excluded  most  basic  research  results.  Hindsight  traced 
the  impacts  on  weapons  systems,  whereas  TRACES  examined  the  impact 
on  single  technologies.  Thus,  the  Hindsight  starting  point,  a 
weapons  system,  is  one  level  higher  (consists  of  many  single 
technologies)  than  the  TRACES  starting  point.  Coupled  with  the 
fact  that  the  Hindsight  weapons  systems  had,  on  average,  35  events, 
and  the  TRACES  innovations  had,  on  average,  70  events,  it  is  not 
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surprising  that  the  Hindsight  events  tended  to  be  applied  research 
or  technology  advances,  whereas  the  TRACES  events  tended  to  be  more 
basic  research.  In  neither  case  were  indirect  impacts  of  basic 
research  given  formal  credit,  although  the  TRACES  study  did  allude 
to  non-mission  research  as  "a  fund  of  knowledge  against  which 
withdrawals  can  be  made  to  achieve  innovation  at  a  rate 
satisfactory  to  society”  [IITRI,  1968]. 

TRACES  Follow-on  Study 

In  a  follow-on  study  to  TRACES,  the  NSF  sponsored  Battelle- 
Columbus  Laboratories  to  perform  a  case  study  examination  of  the 
process  and  mechanism  of  technological  innovation  [Battelle,  1973]. 
For  each  of  the  ten  innovations  studied  (Heart  Pacemaker,  Hybrid 
Corn,  Hybrid  Small  Grains,  Green  Revolution  Wheat, 
Electrophotography,  Input-Output  Economic  Analysis, 
Organophosphorus  Insecticides,  Oral  Contraceptives,  Magnetic 
Ferrites,  Video  Tape  Recorder),  the  significant  events  (important 
activity  in  the  history  of  an  innovation)  and  decisive  events  (a 
significant  event  which  provides  a  major  and  essential  impetus  to 
the  innovation)  which  contributed  to  the  innovation  were 
identified.  The  influence  of  various  exogenous  factors  on  the 
decisive  events  was  determined,  and  several  important 
characteristics  of  the  innovative  process  as  a  whole  were  obtained. 

Based  on  frequency  of  occurrence  of  the  highest  rankings  of 
the  exogenous  factors  on  the  decisive  events,  the  following 
rankings  of  importance  were  obtained: 

1.  Recognition  of  Technical  Opportunity  (motivation  of  the 
timely  improvement  of  an  existing  product  or  process)  ranked  first 
among  the  exogenous  factors; 

2 .  Recognition  of  the  Need  (motivation  for  solving  the  problem 
or  meeting  the  need  satisfied  by  the  eventual  innovation,  rather 
than  any  technological  need)  ranked  second; 

3.  Technical  Entrepreneur  (an  individual  within  the  performing 
organization  who  champions  a  scientific  or  technical  activity) 
ranked  third ; 

4.  Certain  institutional  factors,  such  as  Internal  R&D 
Management,  Availability  of  Funding,  Management  Venture  Decision, 
etc.,  ranked  fourth  collectively,  indicating  the  importance  of  the 
institutional  environment  to  the  innovative  process. 

Based  on  examination  of  characteristics  of  the  case  histories 
as  a  whole,  rather  than  focusing  on  decisive  events  as  above,  the 
following  generalizations  were  drawn: 

1.  The  technical  entrepreneur  is  a  characteristic  important  in 
nine  of  the  ten  innovations,  and  is  a  major  driving  force  in  the 
innovative  process; 

2 .  Early  recognition  of  the  need  was  characteristic  of  the 
history  of  nine  of  the  innovations; 

3 .  Government  funding  was  instrximental  in  direct  support  of 
seven  of  the  innovations.  More  generally,  availability  of 


89 


financial  support,  from  whatever  source,  emerged  as  an  important 
feature  of  the  innovative  process; 

4.  The  occurrence  of  an  unplanned  confluence  of  technology  was 
characteristic  of  six  of  the  innovations.  Confluence  of  technology 
occurred  for  the  other  four  innovations  as  well,  but  as  a  result  of 
deliberate  planning,  rather  than  by  accident; 

5.  Most  of  the  innovations  originated  outside  the  organization 
that  developed  them; 

6.  Additional  supporting  inventions  were  reguired  during  the 
development  effort  for  all  the  innovations  studied  to  arrive  at  a 
product  with  consumer  acceptance. 

Over  the  full  time  span  of  the  innovation,  nearly  34  percent 
of  the  significant  events  were  non-mission  oriented  research 
(NMOR)  ,  38  percent  were  mission  oriented  research  (MOR) ,  26  percent 
were  developmental,  and  a  few  percent  were  nontechnical.  Of  the 
total  events  in  the  period  prior  to  conception  of  the  innovative 
idea,  over  half  were  NMOR  and  one  third  MOR.  In  the  bounded 
interval  between  first  conception  and  first  realization,  16  percent 
were  NMOR,  with  the  remainder  split  among  MOR  (43  percent) , 
development  (38  percent),  and  nontechnical  events  (3  percent). 
Many  of  the  NMOR  events  in  the  bounded  interval  were  in  the  nature 
of  feedback  or  spinoff  basic  research  prompted  by  the  innovation. 
In  the  post-innovation  period,  when  diffusion  and  improvement  take 
place,  10  percent  of  the  events  were  NMOR,  39  percent  were  MOR,  and 
45  percent  were  development. 

The  number  of  NMOR  events  peaked  in  the  period  three  to  four 
decades  prior  to  the  culmination  of  the  innovation,  whereas  the 
number  of  MOR  and  development  events  peaked  in  the  decade  preceding 
the  date  of  innovation.  Half  of  the  NMOR  events  occurred  30  years 
preceding  innovation;  half  of  the  MOR  events  occurred  in  the  15 
years  prior  to  innovation,  and  half  the  developmental  effort  took 
place  within  the  ten  years  preceding  innovation. 

The  study  authors  recognized,  to  some  degree,  that  the  focus 
on  specific  events  did  not  allow  sufficient  credit  to  be  allocated 
to  the  indirect  impacts  of  research.  As  they  stated:  "this  kind  of 
analysis  tends  to  underplay  the  role  of  NMOR  in  the  innovative 
process,  since  it  does  not  portray  the  importance  of  the  general 
background  of  science  necessary  for  the  other  categories  of 
technical  events.  For  example,  MOR  and  developmental  activities  in 
insecticides  would  have  been  impossible  without  the  antecedent 
totality  of  organic  chemistry.  Similarly,  research  on 
contraception  depended  on  the  basic  science  background  of 
reproductive  biology.  As  a  further  example,  in  the  case  involving 
grain  improvement.  Hybrid  Small  Grains  and  Green  Revolution  Wheat 
show  a  low  percentage  of  NMOR  events  (20  percent) ,  but  these 
percentages  would  be  higher  if  the  early  NMOR  events  credited  to 
Hybrid  Corn  were  also  counted  in  their  totals".  They  correctly 
identified  the  absence  of  recognition  given  to  specific  supporting 
fields  of  research.  However,  they  did  not  identify  or  attempt  to 
account  for  the  impacts  of  the  fundamental  research  from  many 
fields  which  resulted  in  the  instrumentation,  theoretical,  and 
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computational  capabilities  necessary  for  these  supporting  research 
fields  to  advance. 

A  Recent  TRACES  Study 

In  the  mid-1980s,  the  National  Cancer  Institute  (NCI) 
initiated  an  assessment  to  determine  the  effectiveness  of  different 
research  settings  or  support  mechanisms  in  bringing  about  important 
advances  in  cancer  research.  The  approach  taken  was  analogous  in 
concept  to  the  initial  TRACES  study,  with  the  addition  of  citation 
analyses  to  provide  an  independent  measure  of  the  impact  of  the 
Trace  papers  (papers  associated  with  each  key  ’event*),  and  by 
adding  control  sets  of  papers. 

Thirteen  important  'Advances’  (key  ’events’)  in  cancer 
research  were  defined  by  a  senior  advisory  panel  of  experts,  and 
the  key  papers  associated  with  these  ’Advances’  and  in  the 
historiographic  research  streams  were  identified.  Both  the  support 
source  and  the  institutional  setting  of  the  papers  were  analyzed. 
In  addition  to  the  Trace  papers,  three  other  sets  of  papers  were 
developed  to  serve  as  comparison  sets  whose  properties  were 
contrasted  with  the  Trace  papers. 

The  study  concluded  that  all  the  research  settings,  and  all 
the  support  mechanisms  (small  and  large  grants,  contracts, 
intramural  NCI,  etc.)  contributed  significantly  to  the  ’Advances’, 
with  no  single  mechanism  or  setting  represented  disproportionately. 
More  specifically,  NCI  provided  37  per  cent  of  the  acknowledged 
support  for  the  Trace  papers,  there  was  a  large  amount  of  co¬ 
operative,  multi-sponsor  support  for  the  Trace  papers,  and  papers 
on  the  Traces,  whatever  the  support  mechanism,  were  extremely 
highly  cited  -  eight  times  as  frequently  as  expected  [Narin,  1989]. 

While  indirect  impacts  of  research  on  the  ’Advances’  were  not 
a  goal  of  this  study  and  were  not  evaluated,  the  additional 
methodology  (mainly  citation  and  co-citation  analysis)  used  in 
performing  the  latest  Traces  incarnation  could  shed  some  light  on 
indirect  impacts.  For  example,  one  of  the  control  sets  of  papers 
used  in  the  study  was  termed  Augmentation  papers  and  consisted  of 
closely  related  contemporaneous  papers  cited  with  the  Traces  papers 
and  identified  through  co-citation  techniques.  Another  of  the 
control  sets  was  called  Science  base  and  consisted  of  papers  cited 
by  the  Trace  papers,  representing  the  precursor  knowledge  upon 
which  the  selected  major’  Advance’  was  dependent. 

These  two  sets  of  papers  provided  some  idea  of  the  direct 
impact  of  other  science  fields  on  the  cancer  fields  of  interest 
(’Advances’).  If  citation  and  co-citation  analysis  were  done  on 
the  Augmentation  papers  and  the  Science  Base  papers,  combined  with 
word  frequency  and  co-word  analyses  of  these  paper  sets  [Kostoff, 
1991d,  1992a,  1993c,  1993e,  1993f,  1994h] ,  and  the  process  repeated 
a  few  times,  then  many  of  the  pathways  through  which  indirect 
impacts  on  the  'Advances’  occur  could  be  identified,  and  the 
magnitude  of  the  impacts  perhaps  quantified  to  some  degree.  The 
amount  of  data  and  analyses  required  would  be  large,  but  based  on 
the  results  and  conclusions  of  a  recent  network-based  approach  to 
evaluating  indirect  impact  of  research  [Kostoff,  1991c,  1992a, 
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1994i],  the  computational/  analytical  problem  is  of  necessity  large 
because  of  the  potentially  large  number  of  pathways  through  which 
direct  and  indirect  impacts  of  research  can  occur. 

ACCOMPLISHMENTS  BOOKS 


Background 

Semi -quantitative  methods,  such  as  Hindsight  and  TRACES, 
require  substantial  commitments  of  people,  time,  and  dollars. 
Because  of  the  large  resource  requirements,  these  types  of  studies 
are  performed  relatively  infrequently.  A  more  common  vehicle  used 
by  Federal  research  sponsoring  organizations  to  display  the  impacts 
of  funded  research  on  advancement  of  science,  actual  or  potential 
impacts  on  advancement  of  allied  science  or  technology,  and 
potential  impacts  on  the  organization's  mission  is  the 
accomplishments  book.  This  type  of  document  tends  to  present 
descriptions  of  selected  scientific  accomplishments  in  sufficient 
detail  for  the  reader  to  understand  the  science  that  was 
accomplished,  and  have  some  idea  of  the  potential  importance  of  the 
research  to  mission,  technology,  and  perhaps  the  commercial  sector. 
The  accomplishments  books  make  no  pretenses  about  being  all- 
inclusive,  nor  do  they  usually  include  quantitative  estimates  of 
impact.  The  accomplishments  are  drawn  from  the  different 
disciplines  funded  by  the  organization,  and  are  meant  to  be 
portrayed  as  representative  of  the  breadth  of  activity.  A  few  of 
these  books  are  described  briefly;  the  books  selected  should  be 
viewed  as  representative  of  the  genre. 

Office  of  Naval  Research  fONR) 

Periodically,  the  ONR  produces  a  book  of  significant 
accomplishments  [e.g.,  ONR,  1992],  The  accomplishments  are 
categorized  into  four  major  areas,  reflecting  the  ONR  Core 
Competency  structure:  Ocean  Sciences,  Advanced  Materials, 
Information  Sciences,  Sustaining  Program.  Thirty-one 
accomplishments  are  described  in  the  most  recent  incarnation,  one 
per  page,  including  topics  such  as  New  Gulf  Stream  Variability 
Model,  New  Semiconductor  Materials  for  At-Sea  Computers, 
Classifying  Underwater  Objects,  and  Merging  Living  Cells  with 
Electronics.  The  reader  of  this  document  receives  a  synopsis  of 
the  many  areas  in  which  ONR  is  involved,  how  these  areas  can  impact 
the  Navy  and  Marine  Corps  potentially,  and  the  types  of  people  and 
organizations  performing  the  research. 

Air  Force  Office  of  Scientific  Research  fAFOSR) 

The  AFOSR  accomplishments  book  is  similar  in  structure  and 
spirit  to  that  of  ONR.  In  one  incarnation  [AFOSR,  1989],  the 
accomplishments  were  divided  among  the  six  technical  disciplines 
which  reflect  the  AFOSR  management  structure:  Aerospace  Sciences, 
Chemical  and  Atmospheric  Sciences,  Electronic  and  Material 
Sciences,  Life  Sciences,  Mathematical  and  Information  Sciences,  and 
Physical  and  Geophysical  Sciences.  Twenty-five  accomplishments 
were  described,  with  more  or  less  equal  representation  from  each  of 


92 


the  six  disciplines.  As  in  the  ONR  book,  no  quantification  of 
impact  was  attempted. 

Department  Of  Energy  fPOE^ .  Office  of  Health  and  Environmental 
Research  fOHER^ 

A  somewhat  different  type  of  accomplishments  book  was 
generated  by  the  DOE,  Office  of  Energy  Research,  for  one  of  its 
component  organizations,  OHER  [DOE,  1983,  1986].  The  approach 
taken  was  to  describe  the  40-year  history  of  OHER,  and  present 
selected  accomplishments  in  different  research  areas  from  different 
points  in  time.  This  technique  allowed  impacts  and  benefits  of  the 
research  to  be  tracked  through  time,  and  in  some  cases  to  be 
quantified  as  well. 

Costs  of  these  programs,  or  subprograms,  were  not  provided, 
and  it  is  therefore  difficult  to  relate  the  benefits,  where  stated, 
to  the  costs.  Some  of  the  benefits,  such  as  an  improved  knowledge 
base  on  which  to  set  health  regulatory  standards,  would  be 
extremely  difficult  to  quantify.  In  some  cases,  the  report  does 
attempt  this  quantification.  For  example,  in  discussing  radiation 
standards,  the  report  states:  "More  stringent  standards,  which 
might  have  been  necessary  in  the  absence  of  knowledge  gained 
through  the  research  program,  could  have  easily  cost  electric  power 
consumers  an  additional  $2  billion  annually"  [DOE,  1983]. 

Other  examples  of  research  accomplishments  probably  not 
amenable  to  quantification  are  presented  throughout  the  report, 
such  as  development  of  a  capability  to  predict  the  travel  and 
dispersion  of  hazardous  substances  (space  debris,  nuclear  weapons 
tests  byproducts)  released  into  the  atmosphere.  No  numbers  are 
associated  with  this  accomplishment. 

There  are  examples  of  hardware,  or  products,  which  resulted 
from  the  research,  and  quantification  is  applied  to  some  of  these 
accomplishments.  The  flow  cytometer  and  centrifugal  fast  analyzer 
(CFA)  were  developed  to  help  search  for  radiation  effects  on 
humans.  These  have  evolved  into  commercial  products,  and  the 
quantified  benefit  given  in  the  report  is:  "About  10,000  units  are 
in  worldwide  use".  In  the  second  volume  [DOE,  1986],  benefits  for 
the  centrifugal  fast  analyzer  are  stated  as:  "estimated  savings  of 
$30  to  $90  million/year."  The  high  resolution  gamma  ray 
spectrometer  was  developed  to  distinguish  between  radioactive 
elements  with  emissions  of  similar  energies.  Today,  it  is  broadly 
used  to  monitor  the  environment  and  in  many  research  applications 
as  well,  arid  the  quantified  benefit  in  the  first  report  is:  "Based 
on  the  value  of  rapid  analysis  as  compared  with  slower 
alternatives,  the  benefit  to  nuclear  plant  operation  alone  is 
estimated  to  be  $20  million  annually". 

A  detailed  reading  of  this  docximent  uncovers  the  difficulties 
of  trying  to  identify,  assign,  and  quantify  costs  and  benefits  of 
basic  research.  As  TRACES  and  other  similar  studies  have  shown, 
the  chain  of  events  leading  to  an  innovation  is  long  and  broad. 
Many  researchers  over  many  years  have  been  involved  in  the  chain, 
and  many  funding  agencies,  some  simultaneously  with  the  same 
researchers,  may  have  been  involved.  How  should  costs  and  benefits 
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be  allocated  under  such  circumstances? 

For  example,  in  volume  2  [DOE,  1986],  the  'original*  funding 
for  the  centrifugal  fast  analyzer  project  was  shared  by  the  Atomic 
Energy  Commission  (AEC)  and  the  National  Institutes  of  Health 
(NIH) ,  and  later  funding  was  provided  by  the  National  Aeronautics 
and  Space  Administration  (NASA)  for  a  zero-G  variant.  How  should 
credit  for  the  benefits  be  shared  among  these  three  agencies?  And 
what  about  all  the  fundamental  research  that  led  up  to  the 
invention  of  the  CFA;  how  should  the  benefits  be  allocated  to  the 
researchers  and  funding  agencies  that  participated? 

Again,  in  volume  2,  in  the  section  about  Iodine-131  therapy 
for  hyperthyroidism,  it  is  stated  that  the  basic  application  of 
Iodine-131  to  toxic  goiter  diseases  was  developed  from  1939-1941. 
The  initial  AEC  involvement  is  reported  in  1946  (when  the  AEC  was 
formed)  when  Iodine-131  from  nuclear  reactor  fission  products  was 
shipped  from  Oak  Ridge  National  Laboratories.  The  report  states 
that  "total  estimated  savings  in  treatment  cost  because  of  the  use 
of  Iodine-131  could  be  as  high  as  $280  million/year".  How  much  of 
this  amount  should  be  credited  to  AEC  research?  All  $280M?  None 
(the  initial  innovation  was  completed  before  the  AEC  was  formed)? 
Only  the  portion  of  the  total  benefits  resulting  from  cheaper 
isotopes?  These  are  difficult  questions  zmd  are  endemic  to  any 
study  of  basic  research  which  tries  to  assign  costs  and  benefits  to 
particular  innovations. 

DOE  High  Energy  Physics  Program 

Another  historiographic-based  approach  to  describing  program 
accomplishments  is  that  used  by  the  DOE  High  Energy  Physics  Program 
[DOE,  1990].  The  history  and  interrelatedness  of  the  diverse 
elements  of  the  program,  followed  by  the  wider  applications  of  high 
energy  physics,  constitute  this  accomplishments  book.  One  chapter 
is  devoted  to  the  impact  of  knowledge  gained  from  high  energy 
physics  on  the  fields  of  astrophysics  and  cosmology.  No 
quantification  is  attempted,  since  improved  understanding  of  the 
universe  does  not  lend  itself  to  that  type  of  analysis. 

More  practical  benefits  resulting  from  better  understanding  of 
high  energy  beams,  as  well  as  resulting  from  the  devices, 
instruments,  and  technologies  that  were  developed  to  perform  high 
energy  physics  research,  are  presented  at  the  end  of  the  report. 
Here,  the  different  applications  are  described  (tumor  treatment, 
medical  diagnosis,  ion  implantation,  materials  research,  x-ray 
lithography,  radioisotope  production,  superconducting  magnets, 
klystrons,  etc) ,  but  no  quantification  is  attempted. 

Advanced  Research  Projects  Aaencv  (ARPA)  Technical  Accomplishments 

The  ARPA  was  established  in  1958  in  response  to  Sputnik. 
ARPA's  initial  primary  focuses  were: 

1.  The  'Presidential  Issues'  of  space; 

2.  Ballistic  Missile  Defense  (Project  DEFENDER)  and  nuclear 
test  detection  (VELA) ; 

3.  Avoiding  future  'Sputniks'  as  its  broader  overall  charter. 


94 


Over  its  lifetime,  as  its  mission  has  been  redefined  and 
refocused,  it  has  sponsored  a  wide  variety  of  thrust  areas, 
including  the  following  major  areas: 

1.  Defense  Manufacturing; 

2.  Nuclear  Test  Monitoring; 

3 .  Naval  Technologies ; 

4.  Materials  and  Components; 

5.  Sensors  and  Surveillance; 

6.  Command,  Control,  and  Communications; 

7.  Information  Processing; 

8.  Ground  Systems  and  Weapons; 

9 .  Air  Systems ; 

10.  AGILE  (counter-insurgency  R&D) ; 

11.  High  Energy  Systems; 

12.  DEFENDER  and  Space  Defense;  Space  Systems. 

In  the  early  1990s,  the  Institute  for  Defense  Analysis  (IDA) 
produced  a  massive  three-volume  set  describing  the  accomplishments 
of  ARPA  [IDA,  1991].  Of  the  hundreds  of  projects  and  programs 
funded  by  ARPA  over  its  then  (1988)  30  year  lifetime,  49  were 

selected  and  studied  in  detail.  Two  criteria  were  used  by  the  IDA 
project  team  and  the  ARPA  management  collectively  in  selecting 
projects/  programs  to  be  studied;  1)  the  importance  of  the 
projects,  judged  on  the  basis  of  evidence  in  attestation  and 
documentation;  and  2)  the  expected  availability  of  data.  The  focus 
of  the  49  retrospectives  documented  was: 

1.  what  were  the  origins  of  each  project  or  program; 

2.  what  did  ARPA  itself  do; 

3.  what  was  the  result,  impact,  and  effect  of  the  work  ARPA 
supported? 

The  structure  of  the  description  of  each  accomplishment  was: 

1.  a  brief  overview  of  the  history  and  accomplishment; 

2.  a  detailed  technical  history  of  the  project; 

3.  observations  on  its  success. 

At  the  end  of  each  project  description  was  a  time  evolution 
chart  of  the  project.  The  actions/  achievements  of  the  different 
organizations  involved  in  the  project's  evolution  (preceding, 
paralleling,  and  succeeding  ARPA's  involvement)  were  shown  as  a 
function  of  time.  The  main  ARPA  involvement  (ARPA  project  track) 
was  highlighted,  related  ARPA  actions  or  ARPA  influence  were  shown, 
ARPA  technology  transfer  was  shown,  and  related  actions  by  other 
groups  was  shown.  At  the  end  of  each  project  writeup,  the  ARPA 
costs  over  the  project  life  (where  known)  were  identified  and  some 
estimate  of  the  dollar  benefits  (where  possible)  was  presented. 

In  general,  the  outcomes  of  ARPA  projects  have  included: 
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1.  development  or  initial  demonstrations  of  new  technology; 

2.  demonstrations  of  new  applications  of  known  technology; 

3 .  development  and  demonstration  of  new  concepts  of 
experimentation  or  operation; 

4 .  integration  of  diverse  technologies  into  new  system 
concepts  for  the  first  time. 

Often,  more  than  one  of  these  kinds  of  payoff  could  be 
achieved  by  the  same  project.  Most  of  the  projects  supported  were 
technology  or  systems  development  rather  than  basic  research,  but 
many  were  fed  by  basic  as  well  as  applied  research.  The  qualities 
of  ARPA- supported  programs  and  projects  that  contributed  to  success 
can  be  summarized: 

1.  A  need  existed  for  what  the  output  could  do; 

2.  There  was  a  strong  commitment  by  individuals  to  a  concept; 

3 .  Bright  and  imaginative  individuals  were  given  the 
opportunity  to  pursue  ideas  with  minimal  bureaucratic  encxambrance ; 

4 .  There  was  an  ongoing  stream  of  technical  developments  and 
evolution; 

5.  ARPA  management  gave  strong,  top-level  management  support; 

6.  There  was  explicit  effort,  taken  early,  to  improve 
acceptance  by  the  user  community. 

The  degree  of  success  and  impact  is  more  difficult  to  measure. 
In  some  cases,  the  results  of  projects  or  programs,  usually 
expressed  in  hardware,  were  transferred  fully  to  a  user.  Other 
transfers  have  been  partial,  limited,  or  indirect.  Given  the 
multifaceted  nature  of  some  projects,  several  of  these 
characteristics  apply  to  the  same  project.  Finally,  success  in 
transferring  the  hardware  or  knowledge  gained  in  ARPA  programs 
often  depends  on  timing  and  the  relationship  to  other  events  and 
programs.  The  report  provides  an  excellent  example  of  the  impact 
of  exogenous  events  on  the  fate  of  SLCSAT,  a  project  which  has  had 
some  successful  technology  validation  of  satellite-submarine  laser 
communication.  Whether  the  Navy  adopts  the  system  for 
communication  with  submarines  will  depend  on  the  Navy's  concepts  of 
submarine  operation  in  the  new  tactical  and  strategic  world  that  is 
emerging  in  the  aftermath  of  the  cold  war  and  the  budget  available 
for  such  purposes  in  the  new  environment. 

The  impacts  of  the  more  fundamental  ARPA  areas  of  support, 
such  as  Materials  Sciences  and  Information  Processing,  are  more 
difficult  to  measure  than  impacts  of  the  development-oriented 
projects,  where  transition  to  a  defined  user  is  somewhat  clearer. 
The  report  defines  ARPA's  impact  in  these  technology  base  areas  as 
having  stimulated  an  infrastructure  and  new  disciplines.  It 
identifies  programs  established  at  universities,  interdisciplinary 
efforts  initiated,  projects  in  fundamental  technologies  accelerated 
by  ARPA  funding,  and  hardware/  software  products  which  resulted. 

Similar  to  the  other  semiquantitative  approaches  described 
above,  the  IDA  report  does  not  (in  the  author's  opinion)  account 
sufficiently  for  benefits  resulting  from  indirect  impacts  of 
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research.  In  the  time  evolution  charts  at  the  end  of  each  project 
writeup,  a  few  critical  events/  technologies  which  preceded  the 
ARPA  involvement  are  shown,  and  then  the  ARPA  contribution  is 
highlighted.  The  existing  pool  of  scientific  and  technological 
knowledge,  which  ARPA  exploited  very  productively,  was  developed 
over  many  years  by  many  diverse  organizations  and  was  a  necessary 
condition  for  ARPA  to  achieve  its  successes  and  impacts.  The 
people  and  organizations  who  developed  this  base  of  technology 
complemented  the  ARPA  effort,  and  should  share  in  the  benefits. 

One  of  the  major  impacts  of  ARPA  support,  which  could  be 
quantified  by  relating  costs  to  benefits,  is  that  projects  were 
brought  to  fruition  earlier  than  they  would  have  been  without  ARPA 
support .  Areas  such  as  gallium  arsenide  semiconductors,  computer 
architectures  (RISC,  systolic  array,  symbolic  processing,  parallel 
processing,  neural  networks) ,  the  ADA  language,  for  example,  were 
accelerated  greatly  because  of  ARPA’s  involvement  and  support. 
Future  ARPA  accomplishments  reports  could  relate  the  ARPA  program 
(or  specific  project)  expenditures  (in  a  discounted  sense)  to  the 
earlier  realization  of  benefits  (in  a  discounted  sense)  due  to  ARPA 
support  to  provide  additional  measures  of  the  effectiveness  of 
ARPA's  funding  [Mansfield,  1991]. 

PRINCIPLES  OF  HIGH  QUALITY  RETROSPECTIVE  STUDIES 

A  careful  reading  of  the  above,  and  many  other,  retrospective 
studies  shows  that  it  is  difficult  to  assess  study  quality  on  the 
sole  basis  of  the  published  report.  However,  principles  for  high 
quality  retrospective  studies  have  been  generated  by  the  author  by 
integrating  the  contents  of  these  reports  with  personal  experience 
in  conducting  retrospective  studies.  A  high  quality  retrospective 
study  is  an  accurate  reflection  of  the  evolution  and  relation,  of 
all  critical  sciences  and  technologies  which  resulted  in  the 
technology  of  present  interest.  Thus,  a  high  quality  retrospective 
study  is  analogous  to  a  high  resolution  picture  of  the  evolving 
relationships  among  science  and  technology  areas  related  critically 
to  the  focal  technology,  and  incorporates  especially  the  concepts 
of  awareness,  coordination,  and  completeness.  More  specific 
requirements,  or  underlying  principles,  necessary  for  a  high 
quality  retrospective  study  can  be  formulated  as  follows. 

The  most  important  factor  is  the  commitment  of  the 
retrospective  study  organization's  senior  management  to 
high-quality  retrospective  studies,  and  the  associated  emplacement 
of  rewards  and  incentives  to  encourage  such  retrospective  studies. 

The  second  most  important  factor  is  the  retrospective  study 
manager's  motivation  to  construct  a  technically  credible  and 
visionary  retrospective  study.  The  retrospective  study  manager 
sets  the  boundary  conditions  and  constraints  on  the  retrospective 
study  scope,  structures  the  working  groups,  and  selects  the  final 
retrospective  study  elements  from  a  myriad  of  inputs.  In  some 
organizations,  the  retrospective  study  manager  has 
the  latitude  to  select  the  complete  retrospective  study  process  and 
criteria,  and  in  all  organizations  presently  has  the  latitude  to 
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select  the  retrospective  study  contributing  technical  experts  by  a 
non-random  process.  If  the  retrospective  study  manager  does  not 
follow,  either  consciously  or  subconsciously,  the  highest  standards 
in  selecting  these  experts,  the  retrospective  study's  final  form 
could  be  substantially  determined  even  before  the  study  process 
begins. 

The  third  most  important  factor  consists  of  the  study  experts' 
competence  and  objectivity.  Each  expert  should  be  technically 
competent  in  his  subject  area,  and  the  competence  of  the  total 
retrospective  study  development  team  should  cover  the  multiple 
research  and  technology  areas  critically  related  to  the  science  or 
technology  area  of  present  interest.  In  addition,  the  team's  focus 
should  not  be  limited  to  disciplines  related  only  to  the  focal 
technology  area  (which  tends  to  reinforce  the  status  quo  and 
further  promulgate  development  along  very  narrow  lines)  ,  but  should 
be  broadened  to  disciplines  and  technologies  well  beyond  the  focal 
technology. 

For  retrospective  studies  which  will  be  used  as  a  basis  for 
comparison  of  science  and  technology  programs  or  projects,  the 
fourth  most  important  factor  is  normalization  and  standardization 
across  different  retrospective  studies,  study  component  teams,  and 
science  and  technology  areas.  For  science  and  technology  areas 
which  have  some  similarity,  use  of  common  experts  (on  the  study 
teams)  with  broad  backgrounds  which  overlap  the  disciplines  can 
provide  some  degree  of  standardization.  For  very  disparate  science 
and  technology  areas,  some  allowances  need  to  be  made  for  the 
relative  strategic  value  of  each  discipline  to  the  organization, 
and  arbitrary  corrections  applied  for  benefit  estimation 
differences  and  biases.  Even  in  this  case  of  disparate 
disciplines,  some  normalization  is  possible  by  having  some  common 
team  members  with  broad  backgrounds  contributing  to  the 
retrospective  studies  for  diverse  programs  and  projects. 

The  fifth  most  important  factor  is  criteria  for  retrospective 
study  component  selection.  Since  retrospective  studies  tend  to 
focus  on  the  critical  science  and  technology  events  which  led  to 
successful  technologies/  systems,  the  definition  of  criteria  for 
'successful'  and  'critical'  is  of  utmost  importance  for 
establishing  the  credibility  of  the  retrospective  study. 

A  factor  of  equal  importance  is  reliability  or  repeatibility . 
To  what  degree  would  a  retrospective  study  be  replicated  if  a 
completely  different  team  were  involved  in  its  construction?  If 
each  team  were  to  construct  a  completely  different  retrospective 
study  for  the  same  topic,  then  what  meaning  or  credibility  or  value 
can  be  assigned  to  any  retrospective  study?  To  minimize 
repeatibility  problems,  a  reasonably  sizeable  segment  of  the 
competent  technical  community  should  be  involved  in  the 
construction  and  review  of  the  the  retrospective  study.  For 
government-constructed  retrospective  studies,  this  does  not  present 
a  conceptual  problem,  although  it  might  present  a  logistics  problem 
for  sufficiently  large  community  involvement.  For  industry- 
constructed  retrospective  studies,  where  proprietary  problems  could 
arise  if  the  external  community  becomes  involved,  the  participation 
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may  have  to  be  limited.  The  recommendation  should  be  re¬ 
interpreted  as  'to  the  degree  possible  within  organizational 
constraints ' . 

A  sixth  critical  factor  for  quality  retrospective  studies  is 
cost.  The  true  total  costs  of  developing  a  high  quality 
retrospective  study  with  substantial  community  input  can  be 
considerable,  but  tend  to  be  understated.  For  high  quality 
retrospective  studies,  where  sufficient  expertise  is  represented  on 
the  study  team,  the  major  contributor  to  total  costs  is  the  time  of 
all  the  individuals  involved  in  developing  and  reviewing  the 
retrospective  study.  With  high  quality  personnel  involved  in  the 
development  and  review  process,  time  costs  are  high,  and  the  total 
study  costs  can  be  non-negligible.  Costs  should  not  be  neglected 
in  designing  a  high  quality  retrospective  study  development 
process . 

The  final  critical  factor,  and  perhaps  the  foundational 
factor,  in  high  quality  retrospective  study  development  is  the 
maintenance  of  high  ethical  standards  throughout  the  process. 
There  is  a  plethora  of  potential  ethical  issues,  because  there  is 
an  inherent  bias/  conflict  of  interest  in  the  process  when  real 
experts  are  desired  as  retrospective  study  performers  and 
reviewers.  The  retrospective  study  development  managers  need  to  be 
vigilant  for  undue  signs  of  distortion  aimed  at  personal  gain. 

SEMI-QUANTITATIVE  METHODS  -  SUMMARY  AND  CONCLUSIONS 

A  variety  of  approaches  were  presented  which  showed  different 
types  of  impacts  of  research,  but  little  or  no  quantification  of 
impact  was  performed.  Hindsight,  TRACES,  and,  to  some  degree,  the 
DARPA  accomplishments  books  had  some  similar  themes.  All  these 
methods  used  a  historiographic  approach,  looked  for  significant 
research  or  development  events  in  the  metamorphosis  of  research 
programs  in  their  evolution  to  products,  and  attempted  to  convince 
the  reader  that:  (1)  the  significant  research  and  exploratory 
development  events  in  the  development  of  the  product  or  process 
were  the  ones  identified;  (2)  typically,  the  organization 
sponsoring  the  study  was  responsible  for  some  of  the  (critical) 
significant  events;  (3)  the  final  product  or  process  to  which  these 
events  contributed  was  important;  and  (4)  while  the  costs  of  the 
research  and  development  were  not  quantified,  and  the  benefits 
(typically)  were  not  quantified,  the  research  and  development  were 
worth  the  cost. 

As  the  historiographic  analyses  (Hindsight/  TRACES)  of  a 
technology  or  system  have  shown,  if  the  time  interval  in  which  the 
antecedent  critical  events  occur  is  arbitrarily  truncated,  as  in 
the  two-decade  time  interval  Hindsight  case,  the  impacts  of  basic 
research  on  the  technology  or  system  will  not  be  given  adequate 
recognition.  As  Hindsight  and  the  different  TRACES  studies  have 
shown,  the  number  of  mission  oriented  research  events  peaks  about 
a  decade  before  the  technology  innovation.  However,  these  studies 
have  also  shown  that  the  number  of  non-mission  oriented  research 
events  peaks  about  three  decades  before  the  technology  innovation. 
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and  eight  or  nine  decades,  or  more,  may  be  necessary  in  some  cases 
to  recognize  the  original  critical  antecedent  events.  Over  a  long 
time  interval,  the  majority  of  key  R&D  events  tend  ■  to  be  non¬ 
mission  oriented.  Thus,  future  studies  of  this  type  should  allow 
time  intervals  of  many  decades  to  insure  that  critical  non-mission 
oriented  research  events  are  captured. 

Even  in  those  cases  when  an  adequate  time  interval  was  used, 
and  critical  non-mission  oriented  events  were  identified,  the 
cumulative  indirect  impacts  of  basic  research  were  not  accounted 
for  by  any  of  the  retrospective  approaches  published  or  in  use 
today.  A  recent  study  [Kostoff,  1991c,  1992a,  1994i]  which 
examined  impacts  of  research  on  other  research  and  technology 
through  direct  and  indirect  paths  using  a  network  approach  showed 
that  the  indirect  impacts  of  fundamental  research  can  be  very  large 
in  a  cumulative  sense.  Future  retrospective  studies  should  devote 
more  effort  to  identifying  indirect  impacts  of  research  to  enhance 
their  credibility.  While  indirect  impacts  of  research  are  much 
more  difficult  to  identify  than  direct  impacts,  and  the  data 
gathering  effort  is  much  larger  and  more  complex,  neglect  of 
indirect  impacts  skews  the  results  and  conclusions  relative  to  the 
value  of  basic  research  significantly.  Use  of  some  of  the  advanced 
computer-based  technologies  available  today,  such  as  the  network 
approach  referenced  above  or  citation  analysis  [Narin,  1989],  could 
identify  many  of  the  pathways  of  the  indirect  impacts  of  research. 

A  detailed  reading  of  some  of  the  studies  which  attempted  to 
incorporate  economic  quantification  showed  the  difficulties  of 
trying  to  identify,  assign,  and  quantify  costs  and  benefits  of 
basic  research,  especially  at  a  micro  level.  As  TRACES  and  other 
similar  studies  have  shown,  the  chain  of  events  leading  to  an 
innovation  is  long  and  broad.  Many  researchers  over  many  years 
have  been  involved  in  the  chain,  and  many  funding  agencies,  some 
simultaneously  with  the  same  researchers,  may  have  been  involved. 
The  allocation  of  costs  and  benefits  under  such  circumstances  is  a 
very  difficult  and  highly  arbitrary  process.  The  allocation 
problem  is  reduced,  but  not  eliminated,  when  the  analysis  is 
applied  at  the  macro  level  (integrating  across  individual 
researchers,  organizations,  etc.). 

Six  critical  conditions  for  innovation  were  identified 
implicitly  and  explicitly  through  analysis  of  these  retrospective 
studies.  The  most  important  condition  from  the  author's 
perspective  implicitly  appears  to  be  the  existence  of  a  broad  pool 
of  knowledge  which  minimizes  critical  path  obstacles  and  can  be 
exploited  for  development  purposes.  The  time  required  to  overcome 
deficiencies  in  the  knowledge  pool  is  the  pacing  item  to  initiate 
the  research  exploitation  process.  This  condition  is  followed  in 
importance,  from  the  author's  perspective,  by  a  technical 
entreprenuer  who  sees  the  technical  opportunity  and  recognizes  the 
need  for  innovation,  and  who  is  willing  to  champion  the  concept  for 
long  time  periods,  if  necessary.  While  the  technical  entrepreneur 
was  viewed  by  some  of  the  studies  as  most  important  to  the 
innovative  process,  it  does  not  appear  (to  the  author)  to  be  the 
critical  path  factor.  Exeuaination  of  the  historiographic  tracings 
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which  display  the  significant  events  chronologically  for  each  of 
the  innovations  shows  that  an  advanced  pool  of  knowledge  must  be 
developed  in  many  fields  before  synthesis  leading  to  ah  innovation 
can  occur.  The  entrepreneur  can  be  viewed  as  an  individual  or 
group  with  the  vision  and  ability  to  both  recognize  the  downstream 
applications  (need)  for  the  research  and  to  assimilate  and/  of 
enhance  this  diverse  Information  and  exploit  it  for  further 
development.  However,  once  this  pool  of  knowledge  exists,  there 
are  many  persons  or  groups  with  capability  to  exploit  the 
information,  and  thus  the  real  critical  path  to  the  innovation  is 
more  likely  the  knowledge  pool  than  any  particular  entrepreneur. 
The  entrepreneurs  listed  in  the  studies  undoubtedly  accelerated  the 
introduction  of  the  innovation,  but  they  were  at  all  times  paced  by 
the  developmental  level  of  the  knowledge  pool. 

The  third  most  important  condition  is  early  recognition  of  the 
need,  coupled  with  early  efforts  taken  to  improve  acceptance  by  the 
user  community.  In  many  cases,  these  functions  will  be  performed 
by  the  entrepreneur.  Also  valuable  for  innovation  are  strong 
financial  and  management  support,  and  occurrence  of  an  unplanned 
confluence  of  technology  coupled  with  many  continuing  inventions  in 
different  areas  to  support  the  innovation. 

One  goal  of  all  the  studies  presented  was  to  identify  the 
products  of  research  and  some  of  their  impacts.  In  addition,  the 
studies  attempted  to  identify  environmental/  management  factors 
which  led  to  successful  research  and  to  rapid  conversion  of  the 
products  of  successful  research  to  technology.  The  Hindsight, 
TRACES,  and  DARPA  studies  tried  to  identify  factors  which 
influenced  the  productivity  and  impact  of  research.  The  following 
conclusions  about  the  role  and  impact  of  basic  research  were 
reached : 


1.  The  majority  of  basic  research  events  which  directly 
impacted  technologies  or  systems  were  non-mission  oriented  and 
occurred  many  decades  before  the  technology  or  system  emerged; 

2.  The  cxomulative  indirect  impacts  of  basic  research  were  not 
accounted  for  by  any  of  the  retrospective  approaches  published; 

3 .  An  advanced  pool  of  knowledge  must  be  developed  in  many 
fields  before  synthesis  leading  to  an  innovation  can  occur; 

4.  Allocation  of  benefits  among  researchers,  organizations, 
and  funding  agencies  to  determine  economic  returns  from  basic 
research  is  very  difficult  and  arbitrary,  especially  at  the  micro 
level . 

A  recent  workshop  on  technology  transfer  validated  the 
conclusions  of  these  classical  studies  [Isaacs,  1996],  at  least  in 
the  corporate  environment.  The  moderators  identified  the  following 
success  factors: 

1)  Management  and  Organizational  Infrastructure 

a.  An  organizational  model  that  encourages  coordination 
between  research  activities  and  product  projects 

b.  Executive-level  commitment  to  the  transfer  of  ideas  from 
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research  groups  to  development  groups 

c.  Geographic  and  social  proximity  between  research  and 
development  groups 

2 )  Technology  Push 

a.  Research  projects  that  are  aligned  with  corporate  strategy 

b.  Research  projects  with  people  highly  motivated  to  see  their 
research  transferred  into  products 

c.  A  high-level  visionary  who  champions  bringing  the  idea  to 
market 

d.  Readily  demonstrable  improvements  over  existing  or  related 
products 

3)  Demand  Pull 

a.  A  product  group  motivated  and  poised  to  take  the  technology 

b.  A  significant  customer  with  a  strong  need  for  the 
technology 

c.  An  involved  marketing  group  that  tracks  customers'  needs 
and  markets  the  ideas  throughout  the  company 

These  and  similar  studies  also  identified  many  other  factors 
important  in  the  successful  evolution  of  science  to  technology. 
Additional  factors  include:  awareness  of  ongoing  research  through 
diverse  information  sources;  types  of  cooperative  R&D  agreements 
between  researchers  and  developers;  intellectual  property  issues 
such  as  disclosure,  protection,  marketing,  negotiating  and 
licensing;  Congressional  incentives  to  collaboration;  and  other 
legal,  financial,  cultural,  and  sociological  incentives  and 
roadblocks. 

Some  personal  observations  of  the  value  of  retrospective 
studies  for  accelerating  science  conversion  [Kostoff,  1997j],  based 
on  both  the  published  studies  and  the  author's  experiences  with 
conversion  of  science  to  technology,  will  now  be  presented.  From 
the  author's  viewpoint.  Project  Hindsight,  with  all  of  its 
limitations,  produced  very  relevant  findings  for  the  science- 
technology  conversion  problem.  A  conceptual  principle  for 
accelerating  the  science-technology  conversion  can  be  abstracted 
from  the  Hindsight  results,  and  it  is  important  to  separate  the 
conceptual  principle  from  the  implementations  of  the  principle.  In 
this  manner,  one  does  not  become  bound  by  the  limitations  of  any 
particular  implementation.  This  principle,  termed  by  the  author  as 
Heightened  Dual  Awareness  (HDA) ,  states  that  in  order  for  the 
science-technology  conversion  to  be  accelerated,  at  least  two 
necessary  conditions  must  be  fulfilled:  1)  the  researcher  must  be 
intimately  aware  of  the  needs  of  the  applications  engineer;  2)  the 
potential  user  of  the  research,  or  transitionee,  must  be  aware  of 
the  progress  and  results  of  the  research.  In  addition,  if  third 
parties  are  involved  in  the  conversion  and  development  process, 
such  as  vendors,  their  awareness  of  both  ends  of  the  conversion 
cycle  must  be  maintained  as  well.  To  the  degree  that  each  of  these 
requirements  is  not  fulfilled,  the  science-technology  conversion 
will  be  retarded  and  delayed. 
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The  author's  personal  observations  of  examples  of  science 
which  has  converted  to  technology  rapidly  have  borne  out  the 
validity  of  the  HDA  principle,  and  of  the  above  studies' 
conclusions  related  to  evolution  of  research  into  successful 
systems.  For  example,  the  author  sponsored  research  at  the 
Department  of  Energy  (DOE)  National  Labs  for  many  years.  In  those 
cases  where  the  departments  in  which  the  research  was  conducted 
were  full  spectrum  S&T  organizations,  the  researchers  were  often 
the  developers  as  well,  and  in  any  case  were  well  aware  of  the 
needs  of  the  developers  and  users.  The  main  motivations  and 
incentives  were  to  transition  the  research  as  rapidly  as  possible, 
and  this  in  fact  is  what  occurred.  As  a  specific  example,  the 
Materials  Department  at  Oak  Ridge  National  Lab  wai  a  full  spectrvim 
materials  R&D  operation.  Intermetallics  research  sponsored  by  the 
author  for  space  applications  meteimorphisized  into  the  high  impact 
Ni3Al  alloy  research  and  development  for  terrestrial  applications. 
The  complete  cycle  from  research  to  advanced  development  was 
conducted  and  completed  very  rapidly  due  to  the  vertically 
integrated  materials  structure  at  Oak  Ridge. 

The  Oak  Ridge  example  illustrates  the  most  straightforward 
application  of  the  HDA  principle.  The  researchers  and  developers 
are  physically  contiguous,  and  in  many  cases  are  the  same  person. 
Thus,  the  dual  awareness  is  readily  effected  by  the  intrinsic 
structure  of  the  physical  environment,  and  complex  management 
structures  are  not  necessary  to  enhance  dual  awareness. 

When  the  author  worked  at  Bell  Laboratories  in  the  1960s  and 
70s,  the  research  functions  were  linked  closely  with  the  advanced 
development  functions  through  two  major  approaches.  First,  the 
more  applied  satellite  laboratories  were  usually  located  adjacent 
to  a  Western  Electric  development  and  manufacturing  facility,  in  a 
^asi-vertically  integrated  management  structure  (Bell  Labs  was  an 
independent  corporation) .  As  in  the  Hindsight  case,  the 
researchers  were  well  aware  of  the  developers'  and  users'  needs, 
and  the  potential  users  were  kept  apprised  of  the  status  of  the 
research.  This  allowed  simultaneous  technology  push  and  demand 
pull,  and  transitions  occurred  smoothly  and  rapidly. 

Second,  in  the  more  centralized  facilities  in  which  the 
fundamental  research  was  conducted,  such  as  the  Murray  Hill 
laboratory,  academic  freedom  characteristic  of  universities  was 
combined  with  facility  and  staff  support  characteristic  of  the  best 
industrial  labs,  with  easy  access  to  the  developers.  Not  only  did 
these  centralized  facilities  contain  contiguous  applied  research 
and  development  components,  but  the  technical  managers  tended  to  be 
career  Bell  System  employees  who  were  extremely  knowledgeable  about 
the  technological  and  operational  needs  of  many  different  segments 
of  the  Bell  System.  Management  awareness  of  both  the  research 
status  and  potential  and  technology  and  system  needs  helped 
strengthen  the  necessary  linkages  between  basic  research  and  the 
developers.  A  recent  article  [Heppenheimer,  1996]  on  the 
development  of  the  transistor  by  Bell  Labs  illustrates  this  point. 
Following  the  invention  of  the  point-contact  transistor,  the 
research  director  did  not  tell  the  inventor  to  redirect  his  work 
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toward  further  developing  and  refining  the  product.  Instead,  he 
gave  that  effort  to  another  manager,  and  left  the  inventor  free  to 
seek  newer  frontiers. 

In  the  Department  of  the  Navy,  much  of  the  research  at  the 
Warfare  Centers  (full  spectriim  R&D  organizations)  is  sponsored 
through  the  program  managed  by  the  author,  the  In-House  Laboratory 
Independent  Research  program.  Here,  the  Technical  Directors  of  the 
Warfare  Centers  select  projects  focused  on  the  Centers'  mission 
requirements.  The  researchers  tend  to  work  part-time  in 
development  activities,  and  are  continuously  aware  of  both  naval 
Fleet  requirements  and  the  state-of-the-art  in  the  research 
community.  Similar  to  the  Oak  Ridge  excunple  presented  previously, 
when  the  researchers  operate  in  such  an  applications-aware 
environment,  their  new  ideas  and  concepts  tend  to  be  naturally 
associated  with  the  naval  applications,  and  have  a  higher 
probability  of  eventual  utility.  Fleet  and  technology  impacts  from 
this  program  have  been  substantial  [ONR,  1996]. 

The  HDA  principle  as  a  major  driver  of  eventual  utility  is  not 
limited  to  the  performer  and  potential  user;  it  is  applicable  to 
the  research  sponsor  environment  as  well.  A  number  of  research 
sponsoring  organizations  have  switched  from  a  discipline 
orientation  to  a  structure  where  the  research  is  vertically 
integrated  with  technology,  analogous  to  the  vertically  integrated 
research- technology  performer  environment  described  above. 

For  example,  in  1993,  the  Office  of  Naval  Research  (ONR),  a 
science  and  technology  development  sponsor,  switched  to  such  a 
structure  in  part  for  the  purpose  of  closing  the  gap  between 
science  and  technology,  and  initial  indications  are  that  this  is 
indeed  occurring.  ONR's  program  officers  (POs)  are  responsible  for 
the  range  spanning  research  to  advanced  development,  and,  as  in  the 
integrated  laboratory  environment,  are  intimately  aware  of  the 
needs  of  the  users.  The  POs  have  the  incentives  to  transition  the 
research  to  development  as  rapidly  as  possible. 

The  general  conclusion  that  the  author  has  drawn  is  that  for 
most  effective  and  efficient  conversion  of  science  to  technology, 
the  researcher  primarily  and  the  sponsor  secondarily  need  to  be 
immersed  in  environments  where  the  HDA  principle  is  most  operative, 
and  where  motivations  and  incentives  are  geared  toward  rapid 
transitioning.  This  type  of  physical  environment  is  realized  most 
efficiently  when  the  researchers  and  developers  are  physically 
contiguous.  If  this  type  of  physical  environment  structure  is  not 
readily  possible,  as  may  be  the  case  with  some  extremely 
fundamental  university  research,  then  attempts  should  be  made  to 
simulate  this  optimal  transitioning  environment  through  innovative 
management  structures.  This  should  not  be  interpreted  as  a 
recommendation  to  substitute  applied  research  for  basic  research. 
Far  too  much  of  this  substitution  has  occurred  in  the  recent  past. 
Rather,  the  recommendation  is  that  basic  research  be  conducted  in 
an  environment  where  there  is  greater  awareness  of  the  progress  and 
potential  of  the  research  by  potential  transitionees  and  users,  and 
opportunities  to  understand  the  needs  of  the  developers  are  made 
available  to  the  researchers. 
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The  irony  is  that  the  optimal  transitioning  research  performer 
environment,  from  a  physical  structure  viewpoint,  exists  most 
strongly  (on  average)  today  in  two  types  of  organizations:  large 
corporate  R&D  labs  and  large  government  or  national  labs.  Yet  non¬ 
government-  financed  basic  research  has  essentially  disappeared  from 
the  large  non-medical  corporate  labs,  and  the  large  government  and 
national  labs  are  being  downsized.  This  trend  can  only  impact  the 
conversion  of  mission-oriented  research  negatively. 

For  mission-oriented  agencies,  to  enhance  the  simulation  of 
optimal  transitioning  physical  structures,  joint  university-federal 
or  national  or  corporate  laboratory  projects  should  be  expanded. 
In  parallel,  as  the  author's  personal  observations  have  also  shown, 
the  potential  user  needs  to  become  involved  in  the  research  project 
as  early,  broadly,  and  intensely  as  possible.  This  early 
involvement  provides  the  user  a  sense  of  ' ownership ' ,  and  produces 
a  more  seamless  transition  process.  In  the  author's  experience, 
incorporating  the  potential  user  from  the  research  proposal 
evaluation  phase  is  not  too  soon  for  successful  downstream 
transitions  of  the  research  products  to  technology. 

In  summary,  while  the  pviblished  retrospective  studies  do 
provide  interesting  information  and  insight  into  the  transition 
process  from  research  to  development  to  products,  processes,  or 
systems,  the  arbitrary  selectivity  and  anecdotal  nature  of  many  of 
the  results  render  any  conclusions  as  to  cost-effectiveness  or 
general iz ability  suspect.  Supplementary  analyses  using  other 
approaches  are  required  for  further  justification  of  the  value  of 
the  RSeD. 

IV-C.  QUANTITATIVE  METHODS 

BACKGROUND  AND  OVERVIEW 

As  the  U.S.  national  debt  and  annual  deficit  have  increased  in 
recent  years,  the  government  has  been  forced  to  focus  its  research 
more  on  strategic  goals,  in  order  to  justify  continuance  of 
research  funding  in  light  of  other  urgent  national  priorities. 
There  is  increased  competition  for  scarce  funds  in  the  Federal 
arena.  Basic  research,  with  its  long-term  payoff  horizon,  now  has 
to  compete  strongly  with  medicare,  welfare,  and  other  service 
provision  and  development  programs.  In  Europe  and  Asia,  basic 
research  has  undergone  a  similar  transformation,  with  more  of  a 
strategic  focus  to  the  research.  Since  the  government  throughout 
the  world  is  now  essentially  the  only  supporter  of  basic  research, 
eventually  the  draining  of  the  fundamental  knowledge  pool  will 
begin  to  affect  the  more  applied  work  which  draws  upon  the  pool. 
The  U.S.  has  not  experienced  these  latent  effects  of  fundamental 
research  deficiency  in  a  major  way  yet,  because  of  the  rich  legacy 
of  industrial  and  government  support  both  at  home  and  abroad  and 
the  continuing  production  of  basic  research  from  different 
government  sources. 

In  this  environment  of  scarce  government  funds,  accountability 
of  all  government  programs  has  increased  substantially.  There  are 
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two  major  characteristics  of  recent  focus  in  increased 
accountability:  more  detailed  programmatic  information  is  requested 
by  the  program  assessors,  and  more  quantified  information  is 
requested.  The  upsurge  in  computer  availability  over  the  past 
decade  has  enabled  large  quantities  of  detailed  information  to  be 
stored,  tracked,  and  interpreted,  and  has  driven  the  request  for 
the  large  volumes  of  detailed  program  information.  The  request  for 
increased  quantitative  information  also  derives  from  the  increased 
computer  capabilities  for  handling  and  analyzing  large  amounts  of 
this  type  of  data.  In  addition,  there  is  substantial  motivation 
from  the  assessors  to  have  simple  quantitative  indicators  which 
could  drive  the  resource  allocation  process,  and  substantiate  and 
justify  the  resource  allocation  decisions  that  are  generated. 

For  service,  production,  and  some  types  of  development 
programs,  these  quantitative  indicators  are  applicable,  meaningful, 
and  useful  in  the  assessment  process.  In  the  area  of  fundamental 
research,  however,  there  is  not  uniform  agreement  on  the  validity 
of  quantitative  indicators  for  assessment  purposes.  Even  among 
those  who  agree  that  quantitative  indicators  have  a  role  in  basic 
research  assessment,  there  is  not  universal  agreement  as  to  which 
indicators  are  valid,  and  how  and  whether  they  should  be  combined 
with  other  quantitative  indicators  and  non-quantitative  approaches 
in  order  to  arrive  at  a  complete  and  meaningful  system  for  research 
assessment. 

This  section  addresses  some  critical  issues  in  the 
applicability  of  quantitative  performance  measures  to  the 
assessment  of  basic  research,  which  today  is  essentially  synonomous 
with  government  sponsored  basic  research.  The  strengths  and 
weaknesses  of  metrics  applied  as  research  performance  measures  are 
examined.  In  particular,  the  application  of  metrics  in  the  context 
of  the  Government  Performance  and  Results  Act  of  1993  is  discussed 
and  critiqued  in  section  IV-C-1.  The  remainder  of  section  IV-C 
provides  an  overview  of  the  quantitative  approaches  used  in 
research  assessment. 

Quantitative  approaches  to  research  assessment  focus  on  the 
numerics  associated  with  the  performance  and  outcomes  of  research. 
The  main  approaches  used  are  bibliometrics  and  econometrics  such  as 
cost-benefit  and  production  function  analysis.  This  section 
focuses  on  these  three  main  approaches,  then  describes  the 
bibliometrics-related  family  of  approaches  known  as  co-occurrence 
phenomena,  then  describes  a  network  modeling  approach  to 
quantifying  research  impacts,  and  ends  with  an  expert  systems 
approach  for  supporting  research  assessment. 

BIBLIOMETRICS 

Bibliometrics,  especially  evaluative  bibliometrics,  uses 
counts  of  publications,  patents,  citations  and  other  potentially 
informative  items  to  develop  science  and  technology  performance 
indicators.  The  choice  of  important  bibliometric  indicators  to  use 
for  research  performance  measurement  may  not  be  straightforward. 
A  recent  study  surveyed  about  4,000  researchers  to  identify 
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appropriate  bibliometric  indicators  for  their  particular 
disciplines  [Australia,  1993].  The  respondents  were  grouped  in 
major  discipline  categories  across  a  broad  spectr\iin  of  research 
areas.  While  the  major  discipline  categories  agreed  on  the 
importance  of  publications  in  refereed  journals  as  a  performance 
indicator,  there  was  not  agreement  about  the  relative  values  of  the 
remaining  19  indicators  provided  to  the  respondents.  For  the 
respondents  in  total,  the  important  performance  indicators  were: 

1.  Publications  (publication  of  research  results  in  refereed 
journals) ; 

2.  Peer  Reviewed  Books  (research  results  published  as 
commercial  books  reviewed  by  peers) ; 

3.  Keynote  Addresses  (invitations  to  deliver  keynote 
addresses,  or  present  refereed  papers  and  other  refereed 
presentations  at  major  conferences  related  to  one's  profession) ; 

4.  Conference  Proceedings  (publication  of  research  results  in 
refereed  conference  proceedings) ; 

5.  Citation  Impact  (publication  of  research  results  in 
journals  weighted  by  citation  impact) ; 

6.  Chapters  in  Books  (research  results  published  as  chapters 
in  commercial  books  reviewed  by  peers) ; 

7.  Competitive  Grants  (ability  to  attract  competitive,  peer 
reviewed  grants  from  the  ARC,  NH&MRC,  rural  R&D  corporations  and 
similar  government  agencies) . 

These  bibliometric  indicators  can  be  used  as  part  of  an 
analytical  process  to  measure  scientific  and  technological 
accomplishment.  Because  of  the  volume  of  documented  scientific  and 
technological  accomplishments  being  produced  (5,000  scientific 
papers  published  in  refereed  scientific  journals  every  working  day 
worldwide;  1,000  new  patent  documents  issued  every  working  day 
worldwide)  ,  use  of  computerized  analyses  incorporating  quantitative 
indicators  is  necessary  to  understand  the  implications  of  this 
technical  output  [Narin,  1994]. 

Narin  states  three  axioms  that  underly  the  utilization  and 
validity  of  bibliometric  analysis.  The  first  axiom  is  activity 
measurement:  that  counts  of  patents  and  papers  provide  valid 
indicators  of  R&D  activity  in  the  subject  areas  of  those  patents  or 
papers,  and  at  the  institution  from  which  they  originate.  The 
second  axiom  is  impact  measurement:  that  the  number  of  times  those 
patents  or  papers  are  cited  in  subsequent  patents  or  papers 
provides  valid  indicators  of  the  impact  or  importance  of  the  cited 
patents  and  papers.  However,  there  could  be  weightings  applied  to 
the  raw  count  data,  depending  on  the  perceived  importance  of  the 
journals  containing  the  citing  papers.  Also,  the  impacts  would  be 
on  allied  research  fields  or  technologies,  not  necessarily  long¬ 
term  impacts  on  the  originating  organization's  mission.  The  third 
axiom  is  linkage  measurement:  that  the  citations  from  papers  to 
papers,  from  patents  to  patents  and  from  patents  to  papers  provide 
indicators  of  intellectual  linkages  between  the  organizations  which 
are  producing  the  patents  and  papers,  and  knowledge  linkage  between 
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their  subject  areas  [Narin,  1994]. 

Use  of  bibliometrics  can  be  categorized  into  four  levels  of 
aggregation  [Narin,  1994]: 

1.  policy  (evaluation  of  national  or  regional  technical 
performance) ; 

2.  strategy  (evaluation  of  the  scientific  performance  of 
universities  or  the  technological  performance  of  companies) ; 

3.  tactics  (tracing  and  tracking  R&D  activity  in  specific 
scientific  and  technological  areas  or  problems) ; 

4.  conventional  (identifying  specific  activities  and  specific 
people  engaged  in  research  and  development) . 

Policy  questions  deal  with  the  analysis  of  very  large  numbers 
of  papers  and  patents,  often  hundreds  of  thousands  at  a  time,  to 
characterize  the  scientific  and  technological  output  of  nations  and 
regions.  Strategic  analyses  tend  to  deal  with  thousands  to  tens  of 
thousands  of  papers  or  patents  at  a  time,  numbers  that  characterize 
the  publication  or  patent  output  of  universities  and  companies. 
Tactical  analyses  tend  to  deal  with  hundreds  to  thousands  of  papers 
or  patents,  and  deal  typically  with  activity  within  a  specific 
subject  area.  Finally,  conventional  information  retrieval  tends  to 
deal  with  identifying  individual  papers,  patents,  and  clusters  of 
interest  to  an  individual  scientist  or  engineer  or  research  manager 
working  on  a  specific  research  project  [Narin,  1994]. 

The  first,  and  major,  step  in  the  performance  of  a  high 
quality  bibliometric  analysis  in  any  of  the  above  four  levels  of 
aggregation  is  acceptance  by  the  potential  user  of  the  above  three 
axioms  to  validate  the  credibility  of  the  bibliometric  approach. 
Once  this  hurdle  has  been  passed,  the  second  step  is  to  select  the 
highest  quality  and  reliability  raw  indicator  products  (data  and 
databases)  and  apply  analyses  of  the  highest  statistical  precis'ion 
and  accuracy  to  these  indicators  [Braun,  1989,  1990,  1993].  The 
third  step,  which  in  many  cases  will  determine  the  utility  of  the 
results,  is  the  interpretation  and  visual  display  of  the  results. 
The  results  of  the  most  stringent  analyses  will  be  relatively 
worthless  if  they  are  not  displayed  in  a  concise  and  lucid  form. 

Indicators  can  be  arranged  in  one  or  more  dimensions. 
Emphasis  has  always  been  laid  on  the  necessity  of  multidimensional 
thinking  while  analyzing  scientometric  indicators.  Scientific 
research  is  a  multifaceted  human  activity,  and  overemphasizing  any 
of  its  aspects  (publication  productivity,  citation  influence, 
technological  applicability,  etc.)  may  lead  to  serious  distortions 
in  its  assessment.  While  each  scientometric  indicator  represents 
a  single  component  of  a  multidimensional  manifold  which  itself  is 
just  one  element  in  assessing  a  complex  system,  presentations  in 
one  or  several  dimensions  may  equally  prove  useful  [Braun,  1993]. 

The  most  direct  way  of  presenting  scientometric  indicators  is 
in  one  dimensional  ranked  lists.  While  simplistic,  this  approach 
reflects  the  paramount  competitiveness  of  the  scientific 
enterprise.  Linear  rankings  are  most  attractive  for  presentation 
to  the  larger  non-specialist  audience  (see  Braun  [1993]). 
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Two  dimensional  displays  can  include  relational  charts  or 
scatter  plots  for  correlations.  In  two  dimensional  relational 
charts  [Schubert,  1986;  Braun,  1987],  pairs  of  indicators  (observed 
vs.  expected  citation  rates  or  attractivity  vs.  activity 
indices) are  displayed  in  a  planar  orthogonal  coordinate  system. 
Emphasis  is  shifted  from  ranking  to  the  formation  of  groups  or 
'clusters'  and  other  characteristic  relations  among  various 
indicators . 

An  obvious  deficiency  of  the  relational  charts  is  the  lack  of 
any  indication  of  the  size  of  the  sets  of  publications  underlying 
the  points  of  the  diagram.  By  adding  the  third  dimension  of 
publication  size,  this  objection  can  be  overcome.  The  basic  idea 
of  'landscaping'  national  scientific  performances  is  to  represent 
the  size  by  the  'mass'  of  a  mountain-like  formation.  If  two  or 
more  countries  have  similar  citation  characteristics,  the  peaks 
representing  them  may  get  superimposed  forming  chains,  massifs,  and 
other  surface  formations.  An  example  is  presented  in  Braun  [1991]. 

There  seems  to  be  a  natural  limit  of  graphical  presentation  at 
three  dimensions.  There  are  techniques,  however,  to  overcome  this 
apparent  restriction.  A  rather  original  method  of  representing 
multivariate  data  was  proposed  by  Herman  Chernoff:  "Each  point  in 
k-dimensional  space,  k<=18,  is  represented  by  a  cartoon  face  whose 
features,  such  as  length  of  nose  and  curvature  of  mouth  correspond 
components  of  the  point.  Thus  every  multivariate  observation  is 
visualized  as  a  computer  drawn  face.  This  presentation  makes  it 
easy  for  the  human  mind  to  grasp  many  of  the  essential  regularities 
and  irregularities  present  in  the  data." 

Braun  [1993]  shows  a  face  pattern  with  18  facial  features 
applicable  in  representing  multidimensional  data.  Schubert  [1992] 
contains  a  four-dimensional  example  of  applying  Chernof f-faces  in 
scientometrics;  uncitedness,  citation  rate  per  cited  paper,  mean 
expected  citation  rate  and  relative  citation  rate  are  represented 
by  the  shape  of  face,  size  of  eyes,  length  of  nose  and  curvature 
and  length  of  mouth,  respectively. 

Problems  with  Bibliometrics 

Generating  the  bibliometric  raw  data  and  performing  computer 
manipulations  on  this  data  are  relatively  straightforward 
processes.  Interpreting  and  assigning  meaning  to  this  data  lies  at 
the  source  of  the  difficulties  with  bibliometrics.  A  personal 
anecdote  partially  illustrates  this  point. 

A  few  years  ago,  the  author  was  asked  to  be  part  of  a  team 
which  reviewed  a  component  of  a  large  agency  laboratory. 
Identification  of  the  agency  and  laboratory  is  not  important  for 
this  discussion.  The  team  judged  the  work  of  the  component  to  be 
excellent,  but  the  number  of  papers  produced  relative  to  the 
component's  funding  was  extremely  small.  Since  the  agency  was 
trying  to  improve  publication  output  of  its  laboratories,  the  team 
recommended  that  the  component  try  to  increase  its  publications. 

A  couple  of  years  later,  the  team  revisited  the  laboratory 
component.  This  time,  the  publication  record  was  much  improved. 
However,  had  the  quality  of  research  improved?  No,  the  quality  was 
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excellent  in  the  first  review  and  remained  excellent  in  the  second 
review.  Had  the  quantity  of  research  increased?  No;  in  fact,  one 
could  probably  make  the  argument  that  there  was  le’ss  research 
produced,  since  research  time  had  to  be  sacrificed  in  writing  the 
extra  papers.  Were  the  users  more  satisfied?  No,  since  in  either 
case  the  direct  users  were  getting  the  quantity  and  quality 
research  product  they  wanted,  and  were  converting  it  to  technology. 

There  appeared  to  be  three  main  benefits  of  emphasis  on 
publication.  First,  there  was  increased  dissemination  of  the 
laboratory's  results  to  the  larger  research  community,  which 
theoretically  could  have  been  of  value  to  the  community  not 
familiar  with  the  laboratory's  work.  The  agency  improved  its 
bibliometric  statistics,  which  it  could  then  display  as  an  example 
of  increasing  research  productivity.  In  addition,  there  was 
probably  some  enhancement  of  the  laboratory's  and  researchers' 
prestige  due  to  the  increased  recognition  in  the  published 
literature. 

The  main  point  to  be  derived  from  the  eibove  anecdote  is  that 
the  fundamental  bibliometric  unit,  the  piiblished  paper  in  a  peer 
reviewed  journal,  is  not  research;  it  is  a  documentation  of 
research.  While  its  contents  are  important  in  disseminating  the 
research  results  and  evaluating  the  quality  and  quantity  of 
research  produced,  the  documentation  counts  need  to  be  associated 
with  many  more  caveats  and  to  be  supported  by  much  interpretation 
before  they  can  become  useful  in  a  research  evaluation. 

A  comprehensive  review  of  bibliometrics  [White,  1989]  shows 
the  sparsity  of  bibliometric  studies  for  research  impact  evaluation 
reported  by  the  Federal  government.  The  reason  for  this  is  due  in 
part  to  the  following  problems  with  pviblication  and  citation  counts 
[King,  1987;  Oberski,  1988;  OTA,  1986]: 

1)  Publication  counts: 

a.  indicates  quantity  of  output,  not  quality; 

b.  non- journal  methods  of  communication  ignored; 

c.  publication  practices  vary  across  fields,  journals, 
employing  institutions ; 

d.  choice  of  a  suitable,  inclusive  database  is  problematical; 

e.  undesirable  publishing  practices  (artificially  inflated 
numbers  of  co-authors,  artificially  shorter  papers)  increasing. 

2)  Citations: 

a.  intellectual  link  between  citing  source  and  reference 
article  may  not  always  exist; 

b.  incorrect  work  may  be  highly  cited; 

c.  methodological  papers  among  most  highly  cited; 

d.  self-citation  may  artificially  inflate  citation  rates; 

e.  citations  lost  in  automated  searches  due  to  spelling 
differences  and  inconsistencies; 

f.  Science  Citation  Index  (SCI)  changes  over  time; 

g.  SCI  biased  in  favor  of  English  language  journals; 

h.  same  problems  as  publication  counts. 
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In  response  to  Cawkell's  [1977]  claims  that  'citation 
anomalies  have  little  effect-they  are  like  random  noise  in  the 
presence  of  strong  repetitive  signals,'  MacRoberts  [1989]  stated 
the  Federal  concerns  about  bibliometrics  eloquently:  "When  only  a 
fraction  of  influences  are  cited,  when  what  is  cited  is  a  biased 
sample  of  what  is  used,  when  influences  from  the  informal  level  of 
scientific  communication  are  excluded,  when  citations  are  not  all 
the  same  type,  and  so  on,  the  'signal'  may  be  repetitive,  but  it  is 
also  weak,  distorted,  fragmented,  incoherent,  filtered,  and  noisy". 

Another  reason  for  limited  Federal  use  can  be  inferred  from 
Narin  [1976],  where  studies  on  the  publication  and  citation 
distribution  functions  for  individuals  are  reviewed.  The 
conclusion  drawn,  from  studies  such  as  those  of  Lotka,  Shockley,  De 
Solla  Price,  and  Cole  and  Cole,  is  that  very  few  of  the  active 
researchers  are  producing  the  heavily  cited  papers.  How  motivated 
are  funding  agencies  to  report  these  hyperbolic  productivity 
distributions  for  different  programs  in  the  open  literature, 
especially  since  many  questions  exist  as  to  the  accuracy  and 
completeness  of  the  bibliometric  indicators?  This  conclusion 
raises  the  further  question  of  the  role  actually  played  by  the  less 
productive  researchers  (as  measured  by  pxablication  and  citation 
counts) :  is  the  productivity  of  the  elite  somehow  dependent  on  the 
output  of  the  less  influential,  or  is  the  role  of  the  less 
productive  members  that  of  maintaining  the  stability  of  the 
research  infrastructure  and  educating  future  generations  of 
researchers? 

Potential  Normalization  Approaches 

Another  problem  with  bibliometrics  is  cross-discipline 
comparisons  of  outputs.  For  example,  how  should  the  paper  or 
citation  output  of  a  program  in  Solid-State  Physics  be  compared  to 
that  of  Shallow  Water  Acoustics.  What  types  of  normalizations  are 
required  to  allow  comparisons  among  these  different  types  of 
programs  and  fields.  Is  there  a  threshold  for  disaggregation  below 
which  the  normalization  factors  apply  to  all  the  subfields.  For 
example,  can  the  normalization  factor  for  Acoustics  be  applied  to 
a  program  in  High  Frequency  Shallow  Water  Acoustics,  or  can  the 
normalization  factor  for  Shallow  Water  Acoustics  be  applied  to  the 
program  in  High  Frequency  Shallow  Water  Acoustics? 

While  many  researchers  and  organizations  have  been  concerned 
about  this  issue,  a  group  centered  at  the  Library  of  the  Hungarian 
Academy  of  Sciences  has  been  addressing  the  problem  of  output 
comparisons,  including  cross-discipline  comparisons,  in  detail  for 
many  years.  The  following  normalization  solutions  they  propose  are 
excerpted  from  a  recent  publication  [Schubert,  1993].  In  addition, 
the  author  has  recently  generated  a  new  approach  for  comparing 
citation  rates  across  different  disciplines  [Kostoff ,  1997m] ,  and 
excerpts  are  contained  in  Section  IV-C-2. 

1.  The  Publishing  Journal  as  Reference  Standard 

Primary  journals  in  science  are  generally  agreed  to  contain 
coherent  sets  of  papers  both  in  contents  and  in  professional 
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standards.  This  coherence  stems  from  the  fact  that  most  journals 
are  nowadays  specialized  in  quite  narrow  sxibdisciplines  and  the 
"gatekeepers”  (i.e.,  the  editors  and  referees)  controlling  the 
journal  are  members  of  an  "invisible  college"  sharing  their  views 
on  questions  like  relevance,  validity  or  quality. 

It  seems,  therefore,  justified  to  expect  the  same  level  of 
citation  rate  for  papers  published  in  the  same  journal  at  the  same 
time.  If  two  such  papers  receive  a  different  number  of  citations, 
one  may  rightly  suspect  that  this  reflects  differences  in  their 
inherent  qualities.  By  relating  the  number  of  citations  received 
by  a  paper  (or  the  average  citation  rate  of  a  subset  of  papers 
published  in  the  same  journal  -  the  Me2ui  Observed  Citation  Rate, 
MOCR)  to  the  average  citation  rate  of  all  papers  in  the  journal 
(the  Mean  Expected  Citation  Rate,  MECR)  the  Relative  Citation  Rate 
(RCR)  will  be  obtained.  This  indicator  shows  the  relative  standing 
of  the  paper  (or  set  of  papers)  in  question  among  its  close 
companions:  it  value  is  higher\lower  than  unity  as  the  sample  is 
more\less  cited  than  the  average.  In  general,  sets  of  papers  under 
investigation  are  published  in  more  than  one  journal;  in  that  case, 
the  mean  expected  citation  rate  (MECR)  can  be  defined  as  the 
average  citation  rate  of  the  journals.  (The  weights  are,  of 
course,  the  publication  frequencies  in  the  respective  journals.) 
The  mean  observed  citation  rate  (MOCR),  i.e.,  the  average  citation 
rate  per  paper  can  again  be  related  to  the  MECR  to  result  in  the 
relative  citation  rate  (RCR) ,  indicating  the  relative  impact  of  the 
papers  in  question  among  the  average  papers  of  the  publishing 
journals  as  reference  standard. 

There  are  some  weaknesses  inherent  in  using  the  publishing 
journal  as  reference  standard.  Papers  published  in 
multidisciplinary  journals  are  measured  by  common  standards,  which 
might  be  clearly  unfair,  say,  for  a  geoscience  article  published  in 
Nature  together  with  a  molecular  genetics  paper.  Since  journals 
form  a  virtually  continuous  spectrum  from  highly  specialized  to 
multidisciplinary,  and  different  research  fields  or  even 
subcommunities  in  the  same  field  may  typically  use  different 
segments  of  this  spectrum,  the  unbiasedness  of  the  reference 
standards  must  be  thoroughly  checked  whenever  comparative 
assessments  are  based  on  the  RCR  indicator. 

As  a  rule,  it  can  be  said  that  in  coherent  research  fields, 
where  papers  are  usually  published  in  specialized  journals  (as  is 
the  general  trend  in  contemporary  science)  published  journals  as 
reference  standards  and  RCR  as  indicator  can  readily  be  proposed 
for  comparative  assessments.  It  must,  however,  be  added  that  even 
in  such  cases  extension  from  one  to  two  dimensions  may  multiply  the 
effectiveness  of  the  analysis. 

2.  The  Set  of  Related  Records  as  Reference  Standard 

"Bibliographic  Coupling"  uses  the  number  of  references  a  given 
pair  of  documents  have  in  common  to  measure  the  similarity  of  their 
subject  matter.  Comparing  a  set  of  papers  that  are  "similar"  in 
this  sense  to  a  given  article  of  the  same  age  will  yield  an  ideal 
reference  standard  for  citation  assessments.  This  apparently 
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simple  and  straightforward  method  has  long  been  practically  un 
accomplishable  because  of  the  technical  difficulties  of  collecting 
the  "coupled”  papers,  by  using  any  traditional  version  of  citation 
indexes . 

Fortunately,  the  situation  has  radically  changed  with  the 
advent  of  the  CD-ROM  edition  of  the  Science  Citation  Index 
database.  The  SCI  CD  Edition  uses  bibliographic  coupling  under  the 
name  related  records.  Two  records  are  considered  "related"  when 
they  list  a  number  of  identical  papers  in  their  respective 
bibliographies.  Related  records  of  an  article  are  other  articles 
published  during  the  same  period  that  cite  at  least  one  of  the  same 
references  that  the  "parent"  article  cited.  Because  they  have 
references  in  common,  an  article  and  its  related  records  are 
supposed  to  be  also  related  by  subject.  In  general,  the  more 
references  in  common,  the  stronger  the  svibject  similarity  between 
two  articles.  The  SCI  CD  Edition  has  a  built-in  possibility  for 
searching  related  records:  a  maximum  of  20  related  records  are 
available  for  any  given  record  ranked  by  strength  of  relatedness. 

In  an  exploratory  study  of  using  SCI  CD  Edition  for 
comparative  evaluation  of  citation  impact,  the  publication  output 
of  the  Hungarian  pharmaceutical  company  CHINOIN  in  1986  was 
investigated.  Three  conclusions  from  the  Study  are: 

a.  Both  for  CHINOIN  publications  and  for  the  "related  records", 
observed  citation  rates  per  paper  fall  short  of  expected  values. 
Thus  it  seems  that  the  research  topics  of  CHINOIN  are  not  the 
"hottest  spots"  of  their  respective  svibject  field,  which  does  not, 
however,  qualify  the  research  in  any  means. 

b.  Although  the  expected  citation  rate  of  CHINOIN  publications  is 
rather  close  to  that  of  the  standard  reference  set  ("related 
records") ,  their  actual  citation  rate  falls  fare  below.  Earlier 
studies  concerning  longer  time  periods  did  not  show  such  a  gap 
between  expected  and  observed  citation  rates.  The  relatively  low 
rate  of  subsequent  year  citations  can  most  probably  be  attributed 
to  insufficient  informal,  prepublication  communication  of  research. 

c.  The  observed  citation  rate  of  the  related  records  is 
conspicuously  close  to  the  expected  citation  rate  of  the  "parent" 
CHINOIN  publications.  This  finding,  in  a  sense,  validates  the  use 
of  relative  scientometric  indicators  based  on  the  comparison  of 
actual  with  expected  (journal  average)  citation  rates.  At  least  in 
the  case  of  the  present  sample,  the  much  more  sophisticated 
"customized"  control  group-compiled  on  the  principle  of 
bibliometric  coupling-obtains  the  same  citation  level  as  reference 
standard  as  did  the  simple  journal  average. 

In  subject  fields  less  coherent  than  pharmaceutical  research, 
however  the  differences  might  be  much  more  substantial,  and  the  use 
of  the  set  of  related  records  as  a  more  reliable  reference  standard 
is  certainly  worth  the  additional  effort. 
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3.  The  Set  of  Cited  Journals  as  Reference  Standard 

The  set  of  publications  to  be  assessed  may  represent  various 
levels  of  aggregation,  such  as  research  teams,  institutions,  or 
whole  research  communities  of  a  given  subfield  in  a  given  country. 
Independently  of  the  level  of  investigation,  the  publishing  journal 
is  a  useful  and  reliable  reference  standard  for  citation 
assessments  -  bearing  in  mind  the  caveats  earlier  mentioned.  In 
one  particular  case,  however,  this  approach  fails  completely, 
namely,  if  journals  themselves  are  subjected  to  comparative 
assessment.  There  is  an  ever  growing  interest  in  evaluation  of 
journals  by  citation  analysis  and  one  of  the  crucial  questions,  in 
this  case  too,  is  the  comparison  of  journals  publishing  in  science 
subfields  of  inherently  different  citation  levels. 

One  possible  solution  might  be  again  the  use  of  related 
records.  It  is  however,  practically  impossible  to  retrieve  the 
related  records  to  every  single  article  of  just  one  volume  of  a 
medium  size  journal  and  to  collect  their  citations. 

Standardization  of  citation  levels  by  subfields  and  comparing 
the  standardized  scores  has  been  attempted.  This  approach  was 
found  to  be  loaded  with  the  inherent  arbitrariness  in  the 
categorization  of  the  journals  into  subfields  and  the  ambiguity  of 
treating  inter-  or  multidisciplinary  journals. 

A  method  which  now  seems  to  provide  the  most  satisfactory 
resolution  at  the  lowest  cost  in  terms  of  computer  and\or  manual 
search  is  based  on  the  journal  In  the  reference  lists  of  the 
articles  of  the  journal  in  question.  These  journals  were  selected 
by  the  most  reliable  persons,  the  authors  of  the  journal  as 
references  (in  both  senses  of  the  word)  and  therefore,  can  justly 
be  regarded  as  standards  of  the  expected  citation  rate. 

All  but  a  very  few  journals  fall  far  below  the  standard  set  by 
their  references.  This  is  perhaps  because  authors  tend  to  base 
their  statements  on  the  most  authoritative  sources.  In  every 
research  area,  a  hierarchy  of  journals  is  set-up  with  one  or  just 
a  few  journals  on  the  top  and  all  others  tend  to  cite  "upwards". 

A  detailed  study  has  been  made  on  2459  journals  covered 
continuously  be  SCI  in  the  period  1981-1985,  and  publishing  at 
least  50  papers  in  these  five  years.  Only  140  of  them  proved  to  be 
cited  above  the  average  of  their  cited  references.  This  subset  may 
rightly  be  considered  the  "chosen  few"  of  the  community  of 
journals. 

A  closer  look  at  this  subset  reveals  that  a  considerable 
nximber  of  these  journals  are  review  journals,  some  of  them  having 
the  work  "review"  even  in  their  title.  This  is  not  too  surprising, 
since  review  papers  are  well  known  to  be  cited  much  above  the 
average.  It  is,  however,  interesting  to  realize  that  analysis  of 
cited  journals  provides  a  simple  means  to  distinguish  review 
journals  from  "ordinary"  ones.  The  indicator  is  the  fraction  of 
journal  self-citations  in  all  citations.  Evidently,  this  fraction 
is  much  lower  for  review  journals  (collecting,  by  their  very 
nature,  references  from  a  much  wider  pool  of  journals)  than  for 
primary  journals. 
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Ribliometric  Validation 

In  a  comprehensive  survey  of  problems  with  citatipn  analysis, 
MacRoberts  and  MacRoberts  [1996]  list  many  deficiencies  with 
citation  analysis.  In  particular,  they  read  papers  in  technical 
fields  with  which  they  were  familiar,  and  compared  the  influence 
evident  in  the  text  with  what  was  contained  in  the  bibliography. 
They  found  that  approximately  30%  of  the  influence  was  cited. 
Their  paper  is  one  of  the  few  cases  where  this  type  of  validation 
study  has  been  performed. 

The  Pied  Piper  Effect 

One  of  the  main  concerns  with  using  citations  as  a  stand-alone 
measure  of  quality  and  impact  has  been  the  potential  bimodal 
interpretation  of  the  numerical  results.  A  paper  could  receive 
high  citations  because  of  its  high  quality,  or  because  the  citers 
disagree  with  it.  However,  there  is  a  third  interpretation  which 
further  precludes  citations  being  utilized  in  stand-alone  mode, 
which  the  author  has  termed  the  "Pied  Piper"  effect. 

Assume  there  is  a  present-day  mainstream  approach  in  a 
specific  field  of  research;  for  example,  the  chemical/  radiation/ 
surgical  approach  to  treating  cancer  (See  section  IV-C-3  for  a  more 
detailed  example  of  the  "Pied  Piper  Effect") .  Assume  that  in,  say, 
fifty  years  a  cure  for  cancer  is  discovered,  and  the  curative 
approach  has  nothing  to  do  with  today's  research.  In  fact,  assume 
it  turns  out  that  today's  approach  was  completely  orthogonal  or 
even  antithetical  to  the  correct  approach.  Then  what  meaning  can 
be  ascribed  to  research  papers  in  cancer  today  which  are  highly 
cited  for  supposedly  positive  reasons? 

In  this  case,  a  paper's  citations  are  a  measure  of  the  extent 
to  which  the  paper ' s  author  has  persuaded  the  research  community 
that  the  research  direction  contained  in  his  paper  is  the  correct 
one,  and  not  a  measure  of  the  intrinsic  correctness  of  the  research 
direction.  In  fact,  the  citations  may  reflect  the  desire  of  a 
closed  research  community  (the  author  and  the  citers)  to  persuade 
a  larger  community  (which  could  include  politicians  and  other 
resource  allocators)  that  the  research  direction  is  the  correct 
one.  This  is  the  "Pied  Piper"  effect.  The  large  nvimber  of 
citations  in  the  above  hypothetical  medical  example  becomes  a 
measure  of  the  extent  of  the  problem,  the  extent  of  the  diversion 
from  the  correct  path,  not  the  extent  of  progress  toward  the 
solution.  The  "Pied  Piper"  effect  is  a  key  reason  why,  especially 
in  the  case  of  revolutionary  research,  citations  and  other 
quantitative  measures  must  be  part  of  and  subordinate  to  a  broadly 
constituted  peer  review  in  any  credible  evaluation  and  assessment 
of  research  impact  and  quality. 

Examples  of  Bibliometric  Studies 

Macroscale  bibliometric  studies  characterize  science  activity 
at  the  national  [e.g.,  Hicks,  1986;  Braun,  1989],  international, 
and  discipline  level.  The  biennial  Science  and  Engineering 
Indicators  report  [NSF,  1996]  tabulates  data  on  characteristics  of 
personnel  in  science,  funds  spent,  publications  and  citations  by 
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country  and  field,  and  many  other  bibliometric  indicators.  Another 
study  at  the  national  level  was  aimed  at  evaluating  the  comparative 
international  standing  of  British  science  [Martin,  1990] .  Using 
publication  counts  and  citation  counts,  the  authors  evaluated 
scientific  output  of  different  countries  by  technical  discipline  as 
a  function  of  time. 

There  is  little  evidence  that  the  results  from  such  studies 
have  much  influence  on  policy  or  decision-making;  i.e.,  the 
allocation  of  resources.  As  Martin  et  al  point  out  in  their 
conclusions,  there  is  potential  benefit  for  a  country  to  understand 
its  position  vis-a-vis  that  of  its  competitors  in  different  science 
areas,  in  order  to  be  able  to  exploit  opportunities  which  may  arise 
in  those  areas.  However,  which  indicators  are  appropriate  and  how 
they  should  impact  allocation  decisions  are  open  questions. 

There  have  been  numerous  microscale  bibliometric  studies 
reported  in  the  literature  [e.g.,  Freune,  1983;  McAllister,  1983; 
Mullins,  1987,  1988;  Moed,  1988;  Irvine,  1989;  Van  Raan,  1989; 
Luukkonen,  1990a,  1990b,  1992].  With  the  notable  exception  of  the 
NIH  [OTA,  1986],  few  Federal  agencies  report  use  of  microscale 
bibliometric  studies  to  evaluate  programs  and  influence  research 
planning  in  the  published  literature.  The  NIH  bibliometric-based 
evaluations  included  the  effectiveness  of  various  research  support 
mechanisms  and  training  programs,  the  publication  performance  of 
the  different  institutes,  the  responsiveness  of  the  research 
programs  to  their  congressional  mandate,  and  the  comparative 
productivity  of  NIH-sponsored  research  and  similar  international 
programs . 

Two  recent  papers  [Narin,  1987b,  1989]  described  determination 
of  whether  significant  relationships  existed  among  major  cancer 
research  events,  funding  mechanisms,  and  perforaer  locations; 
compared  the  quality  of  research  supported  by  large  grants  'and 
small  grants  from  the  National  Institute  of  Dental  Research; 
evaluated  patterns  of  publication  of  the  NIH  intramural  programs  as 
a  measure  of  the  research  performance  of  NIH;  and  evaluated  quality 
of  research  as  a  function  of  size  of  the  extramural  funding 
institution.  Most  of  the  NIH  studies  focused  on  aggregated 
comparison  studies  (large  grants  vs  small,  large  schools  vs  small 
schools,  domestic  vs  foreign,  etc). 

Patent  citation  analysis  has  the  potential  to  provide  insight 
to  the  conversion  of  science  to  technology  [Carpenter,  1981,  1982, 
1983;  Narin,  1984;  Wallmark,  1986;  Collins,  1988;  Narin,  1988a, 
1988b,  1988c;  Van  Vianen,  1990;  Narin,  1991,  1992].  Much  of  the 
Federal  government  support  of  the  development  of  patent  citation 
analysis  was  by  the  NSF  [e.g..  Carpenter,  1980;  Narin,  1987a], 
although  there  is  little  published  evidence  now  of  widespread 
Federal  use  of  this  capedsility.  Some  recent  studies  have  focused 
on  utilization  of  patent  citation  analysis  for  corporate 
intelligence  and  planning  purposes  (Narin,  1990,  1992a,  1992b) . 
Some  of  the  data  presented  verify  further  Lotka's  Productivity  Law, 
where  relatively  few  people  in  a  laboratory  are  producing  large 
numbers  of  patents.  In  the  example  presented  in  Narin  [1992b],  the 
patents  of  the  most  productive  inventor  are  highly  cited,  further 
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demonstrating  his  key  importance.  Narin  concludes  that  highly 
productive  research  labs  are  built  around  a  small  number  of  highly 
productive,  key  individuals. 

An  ongoing  study  of  citations  to  scientific  papers  from  the 
front  pages  of  U.S.  patents  has  potentially  important  implications 
for  science  and  technology  policy.  Some  results  showed  that,  for 
different  countries  that  file  patents  with  the  U.S.  patent  system, 
each  country's  patents  in  the  U.S.  cite  their  own  scientific  papers 
three  times  as  often  as  would  be  expected,  after  normalizing  out 
the  size  of  each  country's  science  [Narin,  1994].  To  end  this 
discussion  of  patent  citation  analysis  on  a  cautionary  note, 
courtesy  of  Pavitt  [1991],  it  is  not  yet  clear  to  what  extent  the 
'other  publications',  cited  in  patents,  reproduce  basic  or  applied 
research,  from  universities  or  corporate  laboratories.  In 
addition,  a  high  proportion  [Pavitt* s  estimation]  of  technology  is 
not  patented,  because  it  is  kept  secret,  because  it  is  tacit  and 
non-codif iable  art,  or  because  -  as  in  the  case  of  software 
technology  -  it  is  very  difficult  to  protect  through  patents. 

Despite  these  limitations,  bibliometrics  may  have  utility  in 
providing  insight  into  research  product  dissemination.  For 
example,  in  a  recent  series  of  presentations  to  large  Federally- 
funded  laboratories  [Kostoff ,  1992b] ,  the  following  suite  of 
bibliometric  studies  was  proposed: 

1.  Examine  distribution  of  disciplines  in  co-authored  papers, 
to  see  whether  the  multidisciplinary  strengths  of  the  lab  are  being 
utilized  fully; 

2.  Examine  distribution  of  organizations  in  co-authored 
papers,  to  determine  the  extent  of  lab  collaboration  with 
universities/  industry/  other  led)S  and  countries; 

3.  Examine  nature  (basic/  applied)  of  citing  journals  and 
other  media  (patents),  to  ascertain  whether  lab's  products  are 
reaching  the  intended  customer (s) ; 

4 .  Determine  whether  the  lab  has  its  share  of  high  impact 
(heavily  cited)  papers  and  patents,  viewed  by  some  analysts  as  a 
requirement  for  technical  leadership; 

5.  Determine  which  countries  are  citing  the  lab's  papers  and 
patents,  to  see  whether  there  is  foreign  exploitation  of  technology 
and  in  which  disciplines; 

6.  Identify  papers  and  patents  cited  by  the  lab's  papers  and 
patents,  to  ascertain  degree  of  lab's  exploitation  of  foreign  and 
other  domestic  technology. 

While  it  was  also  recommended  that  the  lab  compare  its  output 
(papers/  citations  normalized  over  disciplines)  with  that  of  other 
similar  institutions,  this  quantitative  comparison  should  be 
approached  with  great  caution.  A  recent  comparative  bibliometric 
analysis  of  53  laboratories  [Miller,  1992]  clustered  the  labs  into 
six  types  (Regulation  and  Control,  Project  Management,  Science 
Frontier,  Service,  Devices,  Survey) ,  and  stated  that  "comparisons 
of  scientific  impacts  should  be  made  only  with  laboratories  that 
are  comparable  in  their  primary  task  and  research  outputs".  The 
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report  concluded  further  that: 

1.  Bibliometric  indicators  and  scientific  publications  are  not 
the  only  outputs  that  should  be  measured,  but  the  other  types  of 
outputs  differ  for  different  labs; 

2.  Bibliometric  indicators  are  not  equally  valid  across 
different  types  of  laboratories; 

3.  Bibliometric  indicators  are  less  useful  for  the  evaluation 
of  research  laboratories  involved  in  closed  publication  markets. 

In  addition,  studies  were  performed  [Kostoff,  1992c]  to  track 
the  dissemination  of  information  from  accelerated  research 
programs.  Key  papers  (PI)  resulting  from  these  programs  were 
identified,  then  the  citing  papers  for  these  key  papers  (P2)  were 
identified,  then  the  next  generation  of  citing  papers  (P3)  which 
cited  P2  were  identified,  and  so  on.  The  breadth  of  disciplines 
impacted  by  the  key  papers  (PI)  can  be  identified  from  the 
succeeding  generations  of  citing  papers.  The  type  of  analysis  done 
so  far  provided  more  of  a  qualitative  than  quantitative  estimation 
of  breadth  of  impact. 

Preliminary  results  show  that  some  very  fundamental  papers 
impact  across  a  wide  spectmim  of  disciplines,  while  some  high 
quality  but  more  narrowly  focused  research  papers  impact  one  main 
discipline  very  strongly  through  succeeding  generations  of 
citations.  Because  of  the  large  amounts  of  data  required  for  a 
complete  analysis,  especially  where  highly  cited  papers  and  their 
descendents  are  concerned,  present  efforts  focus  on  methods  to 
reduce  data  requirements  and  retain  a  credible  analysis. 

COST-BENEFIT/  ECONOMIC  ANALYSES 


Background 

A  comprehensive  survey  examined  the  application  of  economic 
measures  to  the  return  on  research  and  development  as  an  investment 
in  individual  industries  and  at  the  national  level  [OTA,  1986] . 
This  docviment  concluded  that  while  econometric  methods  have  been 
useful  for  tracking  private  R&D  investment  within  industries,  the 
methods  failed  to  produce  consistent  and  useful  results  when 
applied  to  Federal  R&D  support. 

An  intermediate  study  published  by  the  Commission  of  the 
European  Communities  [Capron,  1992]  concluded  that  "the  economic 
quantitative  methods,  particularly  econometric  models.  Should  be 
viewed  as  an  ex  post  quantitative  evaluation  tool  of  the  economic 
impacts  of  science  and  technology  policy.  They  have  their 
shortcomings  and  limits.  They  are  an  instrument  in  the  toolbox  of 
policy  evaluation  which  can  be  used  for  structured  quantitative 

analyses  of  the  economic  impact  of  R&D  policy . The  economic 

impact  of  government  financed  R&D  might  be  evaluated  by  using 
simultaneously  existing  pinpoint  methods  and  extended 
macroeconometric  models.  While  existing  pinpoint  methods  are 
numerous,  the  most  commonly  used  ones  are  the  productivity  and  the 
investment  approaches.  Extended  macroeconometric  models  might  be 
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conceived  by  adapting  present  macromodels  or  developing  adequate 
models.” 

A  more  recent  analysis  focused  on  economic/  cost-benefit 
approaches  used  for  research  evaluation  [Averch,  1994].  The 
methods  involve  computing  impacts  using  market  information, 
monetizing  the  impacts,  then  comparing  the  value  of  the  impacts 
with  the  cost  of  research.  Principal  measures  described  include 
surplus  measures  and  productivity  measures.  With  known  benefit  and 
cost  time  streams,  internal  rates  of  return  to  R&D  investments  are 
then  computed.  The  paper  notes  both  the  standard  technical 
difficulties  with  these  approaches  and  the  political  and 
organizational  difficulties  in  implementing  them. 

Classical  Microlevel  Application 

Cost-benefit  analysis  has  limited  accuracy  when  applied  to 
basic  research  because  of  the  quality  of  both  the  cost  and  benefit 
data  due  to  the  large  uncertainties  characteristic  of  the  research 
process,  as  well  as  selection  of  a  credible  origin  of  time  for  the 
discounting  computations.  As  an  illustrative  example,  a  cost- 
benefit  analysis  performed  on  a  fusion  reactor  variant  (the  fusion- 
fission  hybrid,  essentially  a  fission  reactor  driven  by  fusion 
neutrons  which  can  produce  both  fissile  fuel  and  power)  will  be 
described  in  some  detail. 

Rutherford's  experiments  in  1934  involving  interaction  of  a 
deuteron  beam  with  solid  deuterium  can  be  viewed  as  the  genesis  of 
fusion  fuel  cycle  research  [Kostoff ,  1983a] .  Almost  since  the 
formation  of  the  AEG  in  the  mid-1940s,  the  Federal  government  has 
invested  significant  sums  of  money  for  the  potential  promise  of 
controlled  fusion  as  an  essentially  limitless  source  of  energy.  In 
1979,  an  economic  analysis  based  on  capital  costs  was  performed  on 
the  fusion  hybrid  and  a  comparison  was  made  with  two  ma'jor 
contenders  for  the  same  type  of  product,  fast  breeders  and 
accelerator  breeders  [Kostoff,  1979].  The  results  showed  projected 
cost  savings  (for  different  parameter  variations)  for  developed 
fusion  hybrid  systems  but  did  not  address  the  time  distribution  or 
magnitude  of  development  costs.  Subsequent  technical  studies 
showed  ranges  of  favorable  operating  conditions  based  on  fusion 
reactor  cycling  times  [Kostoff,  1981,  1982a,  1982b,  1983b,  1985]. 

To  evaluate  the  economic  potential  of  the  fusion-fission 
hybrid,  an  incremental  cost-benefit  analysis  was  performed 
[Kostoff,  1983a] .  While  fusion-related  expenditures  Could  be 
traced  back  to  Rutherford's  experiments  in  1934,  this  study  ignored 
fusion  hybrid  research  expenditures  before  1980  (sunk  costs  from 
the  perspective  of  1980) .  For  the  parameter  ranges  chosen,  it  was 
shown  that  there  was  a  broad  region  over  which  hybrid  development 
could  prove  cost-effective.  However,  had  this  game  analysis  been 
done  in  1934  (around  the  beginning  of  Identifizdale  basic  research 
for  fusion)  .  using  the  seune  cost  and  benefit  g»-reamg  as  In  the  1983 
study  Plus  adding  costs  Incurred  between  1934  and  1980  and 
discounting  back  to  1934.  then  the  result  would  have  been  much 
different  from  the  1983  study. 

In  the  1983  study,  the  problem  was  treated  deterministically; 
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(  uncertainties  or  probabilities  of  success  of  the  different 
parameter  values  being  achieved  were  not  taken  into  account.  The 
real  problem,  which  pervades  and  limits  anv  attempt  to  perform  a 
cost-benefit  analysis  on  a  concept  in  the  basic  research  stage, 
was  the  inherent  uncertainty  of  controlling  the  fusion  process. 
This  translated  to  the  inability  to  predict  the  probcdailities  of 
success  and  time  and  cost  schedules  for  overcoming  fundaunental 
plasma  research  problems  (e.a. .  Plasma  stzdjilities  and  confinement 
times) ;  no  credible  methods  were  available.  Thus,  the  main  value 
of  the  cost-benefit  approach  was  to  show  that  the  potential  existed 
for  positive  payoff  from  the  hybrid  reactor  development,  that  there 
was  a  credible  region  in  parameter  space  in  which  controlled  fusion 
development  could  prove  cost  effective;  what  was  missing  was  the 
likelihood  of  achieving  that  pavoff. 

Macrolevel  Analyses 

Much  of  the  major  recent  economic  work  relating  economic 
growth/  productivity  increases  to  R&D  spending  has  been  performed 
by  three  economists  [Mansfield,  1980,  1991;  Terleckyj ,  1977,  1985; 
Griliches,  1979].  Probably  the  most  widely  publicized  work  over 
the  past  decade  to  examine  rates  of  return  from  basic  research  has 
been  that  of  Mansfield  [e.g.,  Mansfield,  1980,  1991].  His  results 
indicated  that  substantial  social  rates  of  return  can  be  attributed 
to  basic  research.  While  use  of  his  methods  by  government 
officials  has  not  been  reported  in  the  literature,  the  methods  have 
received  widespread  attention  among  research  policy-makers. 
Because  of  the  potential  impact  of  these  methods  if  adopted,  both 
his  production  function  and  recent  marginal  cost-benefit  approaches 
will  be  discussed. 

Production  Function  Approach 

The  earlier  study  [Mansfield,  1980]  attempted  to  determine 
whether  an  industry's  or  firm's  rate  of  productivity  change  was 
related  to  the  amount  of  basic  research  it  performed.  Mansfield 
developed  a  production  function  which  disaggregated  basic  and 
applied  research,  then  regressed  rate  of  productivity  increase  with 
many  different  variables.  The  regressions  showed  a  strong 
relationship  between  the  amount  of  basic  research  carried  out  by  an 
industry  and  the  industry's  rate  of  productivity  increase  during 
1948-1966. 

However,  many  assumptions  were  necessary  to  solve  the 
equations:  constancy  of  ratios  of  variables  over  time;  neglect  in 
the  actual  regression  equations  solved  of  the  (long)  lag  time 
between  when  the  research  is  performed  and  when  the  productivity 
change  is  measured  (though  this  point  is  recognized  and  discussed 
by  Mansfield) ;  and  the  inherent  uncertainties  in  the  data  used  in 
the  equations.  The  results  have  to  be  treated  as  highly  uncertain. 
In  fact,  Mansfield's  results  are  somewhat  inconsistent  with  the 
findings  of  the  second  part  of  his  study,  which  showed,  for  119 
major  firms  surveyed,  that  the  proportion  of  R&D  expenditures 
devoted  to  basic  research  and  to  relatively  risky  projects  declined 
between  1967  and  1977  in  most  industries.  Would  firms  reduce  their 
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own  basic  research  expenditures  if  they  felt  that  their  own  basic 
research  expenditures  would  result  in  increased  productivity? 

Finally,  there  is  the  problem  inherent  in  multiple  regression 
analyses:  that  of  determining  cause  and  effect  from  what  is 
essentially  correlation.  As  Mansfield  points  out,  "It  is  possible 
that  industries  and  firms  with  high  rates  of  productivity  growth 
tend  to  spend  relatively  large  eimounts  on  basic  research,  but  that 
their  high  rates  of  productivity  growth  are  not  due  to  these 
expenditures"  [Mansfield,  1980].  Nor  does  Mansfield's  model 
specify  the  path(s)  by  which  R&D  investment  supposedly  leads  to 
productivity  improvements . 

Recent  Macrolevel  Marginal  Cost-Benefit  Application 

A  recent  study  weighed  the  costs  of  academic  research  against 
the  benefits  realized  from  the  earlier  introduction  of  innovative 
products  and  processes  due  to  the  academic  research  [Mansfield, 
1991] .  A  survey  of  corporate  R&D  executives  showed  that  an  average 
of  seven  years  elapsed  between  a  research  finding  and 
commercialization,  and  that  commercialization  would  have  been 
delayed  an  average  of  eight  years  without  academic  research.  A 
cost-benefit  analysis  using  this  survey  data  showed  a  very  high 
social  rate  of  return  resulting  from  academic  research. 

However,  the  data  were  not  validated  independently  by  a 
document -based  type  of  analysis  (such  as  TRACES  or  Hindsight, 
retrospective  studies  of  innovations)  of  a  sample  nvimber  of  the 
products  and  processes.  The  time  between  the  research  findings  and 
commercialization  is  very  short  compared  to  the  results  of 
Hindsight  or  the  TRACES  studies,  and  is  more  in  line  with  the  lag 
time  between  the  end  of  basic  research  and  commercialization  shown 
by  Hinds ight/TRACES.  Use  of  a  shorter  lag  time  in  the  discounting 
process  increases  the  benefit/cost  ratio  and  the  social  rate  of 
return.  While  the  method  is  innovative,  a  more  objective  data 
source  would  provide  higher  confidence  in  the  computed  rates  of 
return . 


COST-EFFICIENCY 

A  recent  production  function  approach  to  cost-efficiency  of 
basic  research  essentially  used  a  regression  analysis  between 
outputs  and  inputs  [Averch,  1987,  1989].  In  its  latest 
incarnation,  performed  on  NSF  Chemistry  proposals  when  Averch  was 
at  NSF,  the  method  involved  regressing  output  variables  (citations 
per  dollar,  graduate  students  per  dollar)  against  input  variables 
(e.g.,  quality  of  the  investigator's  department,  quality  of  the 
investigator,  etc.).  The  results  gave  some  idea  of  the  importance 
of  the  input  variables,  alone  or  in  combination,  on  the  output 
variables.  One  obvious  potential  application  would  be  prediction 
of  proposals  likely  to  have  high  productivity  based  on  prior 
(input)  knowledge.  Much,  however,  remains  to  be  done  in 
identifying  the  appropriate  output  measures,  the  appropriate  input 
measures,  and  the  nature  of  the  interactions  among  these  measures 
for  different  disciplines. 
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CO-OCCURRENCE  PHENOMENA 


Background 

Modern  quantitative  techniques  utilize  computer  technology 
extensively,  usually  supplemented  by  network  analytic  approaches, 
and  attempt  to  integrate  disparate  fields  of  research.  One  class 
of  techniques  which  tends  to  focus  more  on  macroscale  impacts  of 
research  exploits  the  use  of  co-occurrence  phenomena.  In  co¬ 
occurrence  analysis,  phenomena  that  occur  together  frequently  in 
some  domain  are  assumed  to  be  related,  and  the  strength  of  that 
relationship  is  assumed  to  be  related  to  the  co-occurrence 
frequency.  Networks  of  these  co-occurring  phenomena  are 
constructed,  and  then  maps  of  evolving  scientific  fields  are 
generated  using  the  link-node  values  of  the  networks.  Using  these 
maps  of  science  structure  and  evolution,  the  research  policy 
analyst  can  develop  a  deeper  understanding  of  the 
interrelationships  among  the  different  research  fields  and  the 
impacts  of  external  intervention,  and  can  recommend  new  directions 
for  more  desirable  research  portfolios. 

Little  evidence  of  Federal  use  of  these  techniques  (co¬ 
citation,  co-word,  co-nomination,  and  co-classification  analysis) 
has  been  reported  in  the  open  literature.  However,  as  computerized 
databases  get  larger,  and  more  powerful  computer  software  and 
hardware  become  readily  available,  their  utilization  in  assessing 
research  impact  should  increase  substantially.  These  techniques 
are  discussed  in  more  detail  in  Kostoff  [1992a-  Appendix  III, 
1993c,  1994h] ;  Tijssen  [1994].  The  Tijssen  paper  contains  an 
excellent  exposition  on  mapping  techniques  for  displaying  the 
structure  of  related  science  and  technology  fields. 

Overview  Summary 

Co-citation  analysis  has  been  applied  to  scientific  fields, 
and  co-citation  clusters  have  been  mapped  to  represent  research- 
front  specialties  [Tijssen,  1994].  Co-word  has  been  utilized  to 
map  the  evolution  of  science  under  European  (mainly  French) 
government  support,  and  has  the  potential  to  supplement  other 
research  impact  evaluation  approaches.  Co-nomination,  in  its 
different  incarnations,  has  been  used  to  construct  social  networks 
of  researchers  and  has  the  potential,  if  expanded  to  include 
research  and  technology  impacts  in  the  network  link  values,  for 
evaluating  direct  and  indirect  impacts  of  research.  Co¬ 
classification  is  based  on  co-occurrences  of  classification  codes 
in  patents,  and  is  used  to  construct  maps  of  technology  clusters 
[Engelsman,  1991]. 

Co-citation  Analysis 

Three  of  the  more  applicable  co-occurrence  techniques  to  the 
science  evolution  problem,  listed  in  order  of  level  of  development 
and  frequency  of  utilization,  are  co-citation,  co-word,  and  co¬ 
nomination.  In  co-citation  analysis,  the  frequencies  with  which 
references  in  published  documents  are  cited  together  are  obtained, 
and  are  eventually  used  to  generate  maps  of  clusters  of  cohesive 
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research  themes.  Co-citation  analysis  was  developed  about  two 
decades  ago,  when  the  Science  Citation  Index  became  more  readily 
available  for  computer  analysis,  and  it  has  spawned  a  number  of 
studies  and  reviews,  a  few  of  which  are  listed  here  [Small,  1973, 
1977,  1978;  Garfield,  1978;  Small,  1980,  1985a,  1985b,  1986; 
Franklin,  1988;  Oberski,  1988;  Braam,  1991a,  1991b]. 

It  should  be  noted  that  co-citation  is  a  rather  indirect 
approach  to  obtaining  connectivity  among  research  areas,  and  it 
involves  a  number  of  abstract  steps.  Querying  the  author (s)  of  a 
research  paper  about  what  other  research  areas  are  related  to  their 
work  would  be  the  most  direct  method  of  obtaining  the  desired  data 
[Kostoff,  1991c,  1992a-Appendix  I,  I994i] .  Obtaining  this 
information  by  analyzing  the  words  in  the  paper  and  related  papers 
would  be  the  next  most  direct  method.  Obtaining  this  information 
by  examining  citations  and  co-citations  restricts  the  types  of 
documents  which  can  be  analyzed  (essentially  published  papers)  and 
requires  the  additional  assumption  that  the  themes  of  two  articles 
co-cited  many  times  by  authors  must  be  strongly  related.  While  the 
co-citation  proponents  claim  that  "many  potentially  useful 
applications  have  been  demonstrated”  [Franklin,  1988],  others 
conclude  that  "results  of  co-citation  cluster  analyses  cannot  be 
taken  seriously  as  evidence  relevant  to  the  formulation  of  research 
policy"  [Oberski,  1988]. 

Co-nomination  Analysis 

Co-nomination  is  a  particular  example  of  the  more  general 
social  network  analysis  used  to  study  communication  among  workers 
in  the  fields  of  science  and  technology.  Generally,  in  co¬ 
nomination,  experts  in  a  given  field  are  asked  to  identify  other 
experts,  and  then  a  network  is  generated  which  shows  the  different 
linkages  (and  the  strengths  of  these  linkages)  among  all  the 
experts  (and  possibly  their  organizations  and  technical 
disciplines)  identified.  A  recent  survey  [Shmm,  1988]  of  the 
development  of  social  network  analysis  traces  studies  in  this  area 
back  at  least  three  decades.  Two  of  these  studies  are  particularly 
relevant  to  the  specific  co-nomination  approach  which  will  be 
described,  and  these  two  studies  are  outlined  briefly. 

In  a  study  of  theoretical  high  energy  physicists  [Libbey, 
1967],  respondents  were  asked  to  name  two  persons  outside  their 
institution  with  whom  they  exchanged  research  information  most 
frequently  and  no  more  than  three  who  they  believed  to  be  doing  the 
most  important  work  in  their  area.  A  network  analysis  was  done  to 
identify  communication  linkages.  In  a  later  study  of  theoretical 
high  energy  physicists  [Blau,  1978],  respondents  were  asked  to  name 
two  persons  outside  their  institution  with  whom  they  exchanged 
information  most  frequently  about  their  research.  Again, 
communication  networks  were  generated. 

Co-nomination  was  developed  to  circumvent  co-citation's 
dependence  upon  databases  consisting  of  refereed  scientific 
publications.  It  is  a  more  direct  approach  of  obtaining  links 
zunong  researchers  and,  if  combined  with  other  network  approaches 
which  include  both  links  between  technical  fields  and  the  link 
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strengths  [Kostoff,  I99lc,  1992a-Appendix  I,  1994h,  next  section  in 
HandhooK] ,  could  potentially  incorporate  links  among  researchers 
and  technical  fields.  Since  co-nomination  is  known  less  well  than 
co-citation,  its  latest  embodiment  will  be  described  briefly. 

Researchers  are  sent  a  questionnaire  inviting  them  to  nominate 
other  researchers  whose  work  is  most  similar  or  relevant  to  their 
own.  Based  on  the  responses,  networks  are  then  constructed  by 
assuming  that  links  exist  between  co-nominated  researchers  and  that 
the  strength  of  each  link  is  proportional  to  the  frequency  of  co¬ 
nomination  [Georghiou,  1988].  However,  as  is  the  case  with  co¬ 
citation,  frequency  of  co-occurrence  may  not  be  a  unique  indicator 
of  strength.  One  could  postulate  two  cases:  1)  researchers  co¬ 
nominated  were  doing  essentially  identical  work,  and  their  linkages 
were  very  strong;  and  2)  researchers  were  doing  vaguely  similar 
work,  and  their  linkages  were  very  weak.  In  both  cases,  the 
frequency  of  co-occurrence  would  be  the  seune,  and  the  links  on  the 
network  would  have  the  same  strength. 

Co-word  Analysis 

The  origins  of  co-word  analysis  in  linguistics,  lexicography, 
and  especially  computational  linguistics  can  be  found  in  Hornby 
[1942],  De  Saussure  [1949],  Firth  [1957],  Chomsky  [1965],  Halliday 
[1966],  Harris  [1968],  Sparck  Jones  [1971],  McKinnon  [1977],  Van 
Rijsbergen  [1979],  Melcuk  [1981],  Bahl  [1983],  Choueka  [1983], 
Salton  [1983],  Sparck  Jones  [1984];  Benson  [1986],  Kittredge 
[1986],  Choueka  [1988],  McCardell  [1988],  Nirenberg  [1988],  Smadja 
[1988],  Amsler  [1989],  Church  [1989],  Maarek  [1989],  Salton  [1989]; 
Smadja  [1989],  Church  [1990],  lordanskaja  [1990],  Mays  [1990], 
McDonald  [1990],  Smadja  [1991].  These  origins  of  co-word  analysis 
are  summarized  in  Kostoff  [1991d,  1992a,  1993c,  1994h] ,  along  with 
a  detailed  description  of  modern  day  development  and  applications 
of  co-word  analysis  to  research  policy  and  issues. 

In  summary,  co-word  has  been  utilized  to  map  the  evolution  of 
science  under  European  (mainly  French  and  Dutch)  government  support 
[Callon,  1979,  1983;  Rip,  1984;  Bauin,  1986;  Gallon,  1986; 
Courtial,  1986;  Healey,  1986;  Leydesdorff,  1987a,  1987b;  Bauin, 
1988;  Rip,  1988;  Turner,  1988;  Courtial,  1989;  Leydesdorff,  1989; 
Whittaker,  1989;  Courtial,  1990a,  1990b;  Callon,  1991a;  Braam, 
1991a,  1991b;  Callon,  1991b;  Peters,  1991;  Van  Raan,  1991;  Tijssen, 
1994].  Until  recently,  the  database  used  was  essentially  limited 
to  journal  papers.  The  frequency  of  co-occurrence  of  index  or  key 
words  for  these  papers  was  the  starting  point  for  the  maps  which 
followed.  Use  of  index  words  led  to  a  biasing  termed  the  'indexer 
effect'  [Healey,  1986]  and  effectively  restricted  the  acceptability 
of  co-word  analysis  for  many  years. 

DATABASE  TOMOGRAPHY 

Recently,  a  new  co-word  approach  that  deals  directly  with  text 
and  requires  no  indexing  or  key  words  was  developed  [Kostoff, 
1991b,  1991d,  1992a,  1993c,  1993e,  1993f,  1994h,  1993g,  1994k]. 
The  methodology  can  be  applied  to  any  text  database,  consisting  of 
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published  papers,  reports,  memos,  etc.,  which  can  be  placed  on 
computer  storage  media.  This  revolutionary  approach  has  been  used 
to  identify  pervasive  thrust  areas  of  science  and  technology,  the 
connectivity  among  these  areas,  and  sub-thrust  areas  closely 
related  to  and  supportive  of  the  pervasive  thrust  areas.  The 
approach  utilizes  a  computer-based  algorithm  to  extract  and  order 
data  from  a  large  body  of  textual  material  which,  for  example,  may 
describe  a  broad  spectrum  of  science.  The  algorithm  extracts  words 
and  word  phrases  which  are  repeated  throughout  this  large  database, 
and  allows  the  user  to  create  a  taxonomy  of  pervasive  research 
thrusts  from  this  extracted  data.  The  algorithm  then  extracts 
words  and  phrases  which  occur  physically  close  to  the  pervasive 
research  thrusts  throughout  the  text,  and  allows  the  user  to 
determine  interconnectivity  among  the  research  thrusts,  as  well  as 
determine  research  sub-thrusts  strongly  related  to  the  pervasive 
thrusts.  While  the  focus  of  applications  has  been  to  identify 
technical  thrusts  and  their  interrelationships,  the  raw  data 
obtained  by  the  extraction  algorithms  allows  the  user  to  relate 
technical  thrusts  to  institutions,  journals,  people,  geographical 
locations,  and  other  categories.  An  application  to  a  Former  Soviet 
Union  (FSU)  text  database  follows.  This  text  describes  a  broad 
spectrum  of  FSU  science  (35  reports  generated  by  the 
Foreign  Applied  Sciences  Assessment  Center  (FASAC) ) . 

Background 

About  a  decade  ago,  the  U.S.  Federal  Government  established 
the  Foreign  Applied  Sciences  Assessment  Center  (FASAC)  under  the 
operation  of  the  Science  Applications  International  Corporation 
(SAIC) .  The  purpose  of  FASAC  was  to  increase  awareness  of  new 
foreign  technologies  with  military,  economic,  or  political 
importance.  The  emphasis  was  placed  on  "exploratory  research"- 
(Department  of  Defense  6. 1/6.2  equivalent)  in  the  FSU.  This  work 
seeks  to  translate  fundamental  research  into  new  technology. 

One  of  the  main  products  of  FASAC  is  reports  on  different 
areas  of  "exploratory  research."  FASAC  assembles  panels  of  expert 
consultants  from  academia,  industry,  and  government.  Each  panel 
provides  a  written  assessment  of  the  status  and  potential  impacts 
of  foreign  applied  science  in  selected  areas.  Periodically,  an 
Integration  Report  is  generated  that  describes  the  trends  in 
foreign  research,  including  pervasive  issues  which  affect  research 
capabilities.  By  early  1992,  there  were  about  40  reports  on 
different  aspects  of  FSU  applied  science. 

Database  Tomography  utilizes  the  proximity  of  words  and  their 
frequency  of  co-occurrence  in  some  domain  (sentence,  paragraph, 
paper)  to  estimate  the  strength  of  their  relationship.  When 
applied  to  the  literature  in  a  technical  field.  Database  Tomography 
allows  a  map  of  the  relationship  among  technical  themes  to  be 
constructed.  The  initial  purpose  of  the  Database  Tomography 
development  was  to  identify  pervasive  research  thrusts  (thrusts 
which  transcend  disciplines)  from  those  large  text  databases  which 
contain  descriptions  of  many  research  programs  or  areas  of 
research.  Two  initial  applications  have  been  reported  [Kostoff, 
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1997p] : 

1.  Identification  of  pervasive  research  thrusts  in  a 
database  describing  promising  research  opportunities  for  the  Navy. 
The  database  consisted  of  thirty  reports  produced  by  the  National 
Academy  of  Sciences  panels  and  Office  of  Naval  Research  (ONR) 
internal  experts  on  15  technical  disciplines. 

2.  Identification  of  pervasive  thrusts  in  the  7400  project 
Industrial  R&D  (IR&D)  database. 

Applications  to  other  large  databases  are  ongoing,  and  they 
include  the  following: 

1.  Identification  of  pervasive  themes  and  their  relationships 
in  a  database  of  reports  (FASAC)  describing  applied  technical 
research  topics  in  the  Former  Soviet  Union  (Kostoff ,  1993e,  1993 f) 

2.  Identification  of  pervasive  themes  in  a  database  whose 
narrative  components  describe  each  research  project  sponsored  by 
the  Department  of  Energy 

3.  Identification  of  pervasive  themes  and  their  relationships 
in  a  database  of  journal  articles  related  to  Research  Impact 
Assessment  (Kostoff,  1995a,  1997d) 

4.  Identification  of  pervasive  themes  and  their  relationships 
in  a  database  of  journal  articles  consisting  df  one  year's  issues 
of  the  Journal  of  the  American  Chemical  Society  (Kostoff,  1997d) 

5.  Identification  of  pervasive  themes  and  their  relationships 
in  a  database  of  journal  articles  related  to  Near-Earth  Space 
Science  and  Technology  (Kostoff,  1997e) 

The  reported  studies  and  the  present  study  have  used  the 
following  procedure: 

First,  the  frequencies  of  appearance  in  the  total  text  of  all 
single  words  (for  example,  MATRIX),  adjacent  double  words  (METAL 
MATRIX) ,  and  adjacent  triple  words  (METAL  MATRIX  COMPOSITES)  are 
computed.  The  highest  frequency  technical  content  words  are 
selected  as  the  pervasive  themes  of  the  full  database  (for  example, 
SHOCK  WAVE,  REMOTE  SENSING,  IMAGE  PROCESSING). 

Second,  for  each  theme  word,  the  frequencies  of  words  within 
+-50  words  of  the  theme  word  for  every  occurrence  in  the  full  text 
are  computed.  A  word  frequency  dictionary  is  constructed  which 
shows  the  words  closely  related  to  the  theme  word.  Nvimerical 
indices  are  employed  to  quantify  the  strength  of  this  relationship. 
Both  quantitative  and  qualitative  analyses  of  each  dictionary 
(hereafter  called  cluster)  yield  those  subthemes  closely  related  to 
the  main  cluster  theme. 

Third,  threshold  values  are  assigned  to  the  nvimerical  indices. 
These  indices  are  used  to  filter  out  the  most  closely  related  words 
to  the  cluster  theme  (e.g.,  see  Figure  1  below  for  part  of  a 
typical  filtered  cluster  from  the  FASAC  study) . 

Cl^...Cl......Ii.......El^...*.... CLUSTER . MEMBER 

. (Cij/Ci) . . (Cij^2/CiCj) 

022.  .  .0036 _ 0.611 _ 0.0359 . THERMAL.  INFRARED 

056.  .  .0323 _ 0.173 _ 0.0259 . ICE 
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070.  .  .0522 - 0.134 _ 0.0250 


SATELLITE 


CODE : 

Cij  IS  CO-OCCURRENCE  FREQUENCY,  OR  NUMBER  OF  TIMES  CLUSTER  MEMBER 
APPEARS  WITHIN  +-50  WORDS  OF  CLUSTER  THEME  IN  TOTAL  TEXT; 

Ci  IS  ABSOLUTE  OCCURRENCE  FREQUENCY  OF  CLUSTER  MEMBER; 

Cj  IS  ABSOLUTE  OCCURRENCE  FREQUENCY  OF  CLUSTER  THEME; 
li,  THE  CLUSTER  MEMBER  INCLUSION  INDEX,  IS  RATIO  OF  Cij  TO  Ci;  AND 
Eij ,  THE  EQUIVALENCE  INDEX,  IS  PRODUCT  OF  INCLUSION  INDEX  BASED  ON 
CLUSTER  MEMBER  li  (Cij/Ci)  AND  INCLUSION  INDEX  BASED  ON  CLUSTER 
THEME  Ij  (Cij/Cj). 

Figure  1.  Remote  Sensing  Cluster  -  Closely  Related  Words. 

Subsets  of  closely  related  words  are  combined  into  one  file. 
Words  which  are  common  to  more  than  one  subset  (cluster  overlaps) 
are  identified.  Megaclusters,  or  strings  of  overlapping  clusters 
(based  on  a  threshold  of  numbers  of  common  words,  or  overlaps) ,  are 
constructed.  These  show  umbrella  areas  of  related  research. 

The  final  results  identify:  (1)  the  pervasive  themes  of  the 
database;  (2)  the  relationship  among  these  themes;  and  (3)  the 
relationship  of  supporting  svib-thrust  areas  (both  high  and  low 
frequency)  to  the  high-frequency  themes. 

Nvimbers  are  limited  in  their  ability  to  portray  the  conceptual 
relationships  among  themes  and  siib-themes.  The  qualitative 
analyses  of  the  extracted  data  have  been  at  least  as  important  as 
the  quantitative  analyses.  The  richness  and  detail  of  the 
extracted  data  in  the  full  text  analysis  allows  an  understanding  of 
the  theme  interrelationships  not  heretofore  possible  with  previous 
text  abstraction  techniques  using  index  or  key  words. 

Application  of  Database  Tomography  to  FASAC  Database 

The  FSU  is  a  major  contributor  to  many  areas  of  science  and 
technology.  FASAC  reports  help  to  document  and  interpret  these 
contributions.  There  is  interest  in  preserving  the  basic  science 
capability  of  the  FSU.  This  task  would  benefit  from  improved 
understanding  of  the  FSU  science  and  technology  capability. 

Application  of  full  text  co-word  analysis  (Database 
Tomography)  to  the  FSU  component  of  the  FASAC  database  could 
provide  a  unique  perspective  on  the  FSU  science  and  technology 
capability.  This  database  has  a  different  stiructure  from  the 
databases  analyzed  previously.  FASAC  contains  topical  area 
assessments,  whereas,  the  other  databases  analyzed  contain  program, 
project,  or  promising  opportunity  descriptions.  Full  text  co-word 
analysis  is  sufficiently  powerful  and  flexible  to  be  applicable  to 
FASAC  as  well.  (Unclassified  FASAC  reports  were  used.)  The  FASAC 
database  has  a  moderate  density  of  technical  terms.  Most  are 
scientific,  but  there  are  many  institute  names,  journal  names, 
publishers,  and  people  names.  Determination  of  the  relationship 
among  only  technical  areas  is  more  difficult  than  in  some  purely 
technically  focused  databases  which  were  analyzed  previously. 
However,  the  data  allows  analyses  which  go  beyond  purely  technical 
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relationships. 

Multiword  Frequency  Analysis 

The  output  of  the  multiword  frequency  analysis  allows 
construction  of  a  multilevel  taxonomy  of  the  full  database.  This 
taxonomy  derives  from  the  language  and  natural  divisions  of  the 
database  (analogous  to  a  natural  coordinate  system  of  the 
database) .  Database  entries  are  easily  categorized. 

Other  taxonomies  are  generated  top-down  and  usually  attempt  to 
force-fit  database  siabjects  into  pre-determined  categories. 

One  advantage  of  the  present  full  text  approach  over  the  index 
or  key  word  approach  is  that  many  types  of  taxonomies  can  be 
generated,  such  as:  science,  technology,  institution,  journal,  and 
person  name.  Within  any  one  of  these  categories,  such  as  science, 
many  types  of  taxonomies  can  be  developed.  An  example  of  one 
science  taxonomy  of  the  FASAC  database  will  be  shown. 

Based  on  the  high  frequency  single,  adjacent  double  and  triple 
words,  the  following  high  level  taxonomy  was  generated.  The 
capitalized  words  are  sample  high  frequency  words  from  the 
multiword  frequency  analyses: 

1.  Information:  DATA,  IMAGE  PROCESSING,  STATISTICAL  PATTERN 

RECOGNITION 

2.  Physics:  LASER,  SHOCK  WAVE,  CHARGED  PARTICLE 

ACCELERATORS 

3.  Environment:  OCEAN,  SEA  SURFACE,  INTERNAL  GRAVITY  WAVES 

4.  Materials:  MATERIALS,  THIN  FILM,  METAL  MATRIX  COMPOSITES 

Caution  must  be  exercised  in  relating  the  above  taxonomy  based 
on  FASAC  to  the  actual  taxonomy  of  all  of  FSU  science.  The  FASAC 
reports  represent  selected  areas  of  FSU  science.  It  is  not  known 
how  representative  all  the  FASAC  reports  are  of  total  FSU  science. 
The  FASAC  reports  tend  to  reflect  the  open  FSU  literature.  It  is 
not  known  how  well  this  open  literature  represents  all  of  FSU 
science,  including  classified  work  and  other  unreported  work. 

The  above  taxonomy  reflects  frequency  of  word  usage.  It 
represents  the  numbers  of  words  written  about  technical  areas  in 
the  FASAC  reports.  Dollars  spent  on  these  areas,  or  other  measures 
of  FSU  priorities,  were  not  taken  into  account.  The  taxonomy  could 
be  skewed  relative  to  FSU  importance  attached  to  these  areas. 
Nevertheless,  the  above  taxonomy  does  offer  insight  into  .areas  of 
FSU  science  of  interest  to  the  U.S. 

Meaaclusters 

Clusters  which  had  three  or  more  overlaps  (three  or  more 
common  members)  were  combined  to  form  strings  of  related  clusters, 
or  megaclusters.  The  following  megaclusters  were  obtained: 
Ionospheric  Heating/Modification,  Image/ Optical  Processing,  Air-Sea 
Interface,  Low  Observable,  Explosive  Combustion,  Particle  Beams, 
Automatic/Remote  Control,  Frequency  Standards,  Radar  Cross  Section. 
Of  the  60  cluster  themes  that  were  used  to  compute  overlaps,  52 
were  in  one  of  the  nine  megaclusters  above.  Most  of  the  eight 
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remaining  themes  could  be  subsumed  under  the  nine  megaclusters. 

The  science  discipline  taxonomy  for  the  FASAC  database  was 
derived  from  the  multiword  frequency  analysis.  It  waS  defined  as 
Information,  Physics,  Environment,  and  Materials.  In  terms  of  the 
megaclusters : 

1.  Information  would  encompass:  IMAGE/0]^ICAL  PROCESSING, 

AUTOMATIC/  REMOTE  CONTROL  ' 

2.  Physics  would  encompass:  IONOSPHERIC 
HEATING/MODIFICATION,  PARTICLE  BEAMS,  FREQUENCY 
STANDARDS,  RADAR  CROSS  SECTION 

3 .  Environment  would  encompass :  AIR-SEA  INTERFACE 

4.  Materials  would  encompass:  EXPLOSIVE  COMBUSTION,  LOW 
OBSERVABLE 

Categorizing  the  database  with  the  megacluster  subcategories 
allows  the  re-interpretation  of  the  FASAC  database  as  a  compendixam 
of  those  aspects  of  FSU  science  of  interest  to  the  U.S.  for 
strategic  and  military  purposes  rather  than  a  microcosm  of  all  of 
FSU  science.  For  example,  many  classes  of  materials  were 
researched  and  developed  in  the  FSU.  Yet  the  materials  subcategory 
in  the  FASAC  analysis  focuses  on  FSU  capabilities  in  energetic 
materials  (explosives  and  propellants)  and  coatings  to  reduce  radar 
cross  sections.  Both  classes  are  important  from  a  military 
viewpoint.  The  main  environmental  focus  is  air-sea  interface. 
There  is  little  mention  of  the  terrestrial  environment.  The 
primary  information  category  focus  is  on  image  and  optical 
processing,  and  the  secondary  information  category  focus  is  on 
remote  control.  One  could  conclude  that  the  FASAC  concern  was  FSU 
capability  in  sensing  the  ocean  for  ship  and  submarine  activity, 
and  remotely  processing  and  interpreting  this  information. 

The  secondary  environmental  focus  of  FASAC  was  on  FSU 
capabilities  for  modifying  the  ionosphere  through  high  power  radio 
wave  heating  and  exploiting  its  use  as  a  communication  medium.  One 
focus  of  the  physics  category  was  particle  beams.  These  could  have 
dual  applications  of  high  energy  directed  weapons  and  heaters  for 
magnetically  confined  plasmas  and  inertial  fusion  targets. 

Cluster  Theme/Member  Relationships 

The  final  display.  Figure  2,  shows  high  technical  content 
words  from,  one  of  the  smallest  of  the  60  clusters.  The  selection 
cutoff  criterion  was  an  Equivalence  Index  (see  Figure  1  for 
definition)  greater  than  or  equal  to  0.001.  A  simple  division  of 
word  categories  into  quadrants  based  on  Inclusion  Index  values  was 
used  to  display  the  relationships  of  the  cluster  members  to  the 
cluster  theme  and  to  each  other. 

ATMOS  OCEANIC  PHYS  CLUSTER  -  HIGH  TECHNICAL  CONTENT  WORDS 

HIGH  li  HIGH  li 


HIGH.Ii . LOW.Ii 


LOW.I-i .  .  .HIGH.li 
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SEA . 

INTERNAL. WAVE 

ACOUSTIC . 

SCATTERING. . . 

RADAR . 

SEA. SURFACE . 

ATMOSPHERE. .  . 


- RADIOACOUSTIC . SOUNDING 

- ACOUSTIC . SOUNDING 

- THEORY. of: WIND 

_ MODELING . OF . SURFACE 

_ ATTENUATION . OF . SOUND . 

INFRASOUND . AND . INTERNAL . 

. . .THEORY. OF. WAVE 


. LOW.Ii _ LOW.Ii 

WIND .  WAVES . SHEAR .  FLOW . PROCESSING .  OF .  RADAR 

SOUND .  PROPAGATION _ TURBULENT . WAVE .  PROPAGATION 

OCEAN .  SURFACE . SATELLITE . WIND .  VELOCITY 

INTERNAL .  GRAVITY .  WAVES . POINT .  SOURCE 

STRATIFIED.  FLUID . SOUND.  WAVES 

Figure  2.  High  Technical  Content  Words  of  Final  Display. 

In  Figure  2,  the  underlined  topic,  ATMOS  OCEANIC 
PHYS,  is  the  cluster  theme.  The  cluster  members  are  segregated 
into  quadrants  headed  by  their  values  of  Inclusion  Indices.  Ij  is 
the  ratio  of  cij  to  Cj ,  and  is  the  Inclusion  Index  based  on  the 
theme  word.  li  is  the  ratio  of  Cij  to  Ci,  and  is  the  Inclusion 
Index  based  on  the  cluster  member.  The  dividing  points  between 
high  and  low  Ij  and  li  are  the  middle  of  the”]cnee''  of  the 
distribution  functions  of  numbers  of  cluster  members  vs.  values  of 
Ij  and  li.  All  cluster  members  with  Ij  greater  than  or  equal  to 
0.1  were  defined  as  having  high  I j .  All  cluster  members  with  li 
greater  than  or  equal  to  0.5  were  defined  as  having  high  li. 

A  high  value  of  Ij  means  that,  whenever  the  theme  word  appears 
in  the  text,  there  is  a  high  probability  that  the  cluster  member 
will  appear  within  +-50  words  of  the  theme  word.  A  high  value  of 
li  means  that,  whenever  the  cluster  member  appears  in  the  text, 
there  is  a  high  probability  that  the  theme  word  will  appear  within 
+-50  words  of  the  cluster  member. 

Thus,  words  located  in  the  upper  quadrant  (high  Ij  high  li) 
are  coupled  very  strongly  to  the  theme  word.  Whenever  the  theme 
word  appears,  there  is  a  high  probability  that  the  cluster  member 
will  be  physically  close.  Whenever  the  cluster  member  appears, 
there  is  a  high  probability  that  the  theme  word  will  be  physically 
close.  Whenever  either  word  appears  in  the  text,  the  other  will  be 
physically  close. 

Consider  words  located  in  the  left  quadrant  (high  Ij  low  li) . 
Whenever  the  cluster  member  appears  in  the  text,  there  is  a  low 
probability  that  it  will  be  physically  close  to  the  theme  word. 
Whenever  the  theme  word  appears  in  the  text,  there  is  a  high 
probability  that  it  will  be  physically  close  to  the  cluster  member. 
This  type  of  situation  occurs  when  the  frequency  of  occurrence  of 
the  cluster  member  Ci  is  substantially  larger  than  the  frequency  of 
occurrence  of  the  theme  word  Cj ,  and  the  cluster  member  and  the 
theme  word  have  some  related  meaning. 

Single  words  have  absolute  frequencies  of  an  order  of 
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magnitude  higher  than  double  words.  Thus,  the  words  in  the  left 
quadrant  are  typically  high  frequency  single  words.  They  are 
related  to  the  theme  word  but  much  broader  in  meaning  than  the 
theme  word.  A  small  fraction  of  the  time  that  these  broad  single 
words  appear,  the  more  narrowly  defined  dovible  word  theme  will 
appear  physically  close.  However,  whenever  the  narrowly  defined 
double  word  theme  appears,  the  broader  related  single  word  cluster 
member  will  appear.  The  words  in  the  left  quadrant  can  also  be 
viewed  as  a  higher  level  taxonomy  of  technical  disciplines  related 
to  the  theme  ATMOS  OCEANIC  PHYS. 

Consider  words  located  in  the  right  quadrant  (low  Ij  high  li)  . 
Whenever  the  cluster  member  appears  in  the  text,  there  is  a  high 
probability  that  it  will  be  physically  close  to  the  theme  word. 
Whenever  the  theme  word  appears  in  the  text,  there  is  a  low 
probability  that  it  will  be  physically  close  to  the  cluster  member. 
This  type  of  situation  occurs  when  the  frequency  of  occurrence  of 
the  cluster  member  Ci  is  substantially  smaller  than  the  frequency 
of  occurrence  of  the  theme  word  Cj ,  and  the  cluster  member  and  the 
theme  word  have  some  related  meaning.  Thus,  the  words  in  the  right 
quadrant  tend  to  be  low  frequency  double  and  triple  words,  related 
to  the  theme  word  but  very  narrowly  defined. 

A  large  fraction  of  the  time  that  these  very  narrow  double  and 
triple  words  appear,  the  relatively  broader  double  word  theme  will 
appear  physically  close.  However,  a  small  fraction  of  the  time 
that  the  relatively  broad  double  word  theme  appears,  the  more 
narrow  double  and  triple  word  cluster  member  will  appear.  This 
quadrant  grouping  has  the  potential  for  identifying 
"needle-in-a-haystack”  type  thrusts  which  occur  infrequently  but 
strongly  support  the  theme  when  they  do  occur.  One  of  many 
advantages  of  full  text  over  key  or  index  words  is  this  illustrated 
ability  to  retain  low  frequency  but  highly  important  words,  since 
the  key  word  approach  ignores  the  low  frequency  words. 

The  words  in  the  bottom  quadrant  (low  Ij  low  li)  are  the 
remainder  of  the  culled  words.  They  relate  to  and  support  the 
theme,  but  do  not  have  the  strong  inclusions  based  on  theme  or 
cluster  member  occurrence  of  the  members  of  the  other  quadrants. 
The  upper  quadrant  typically  contains  very  few  or  no  words.  The 
left  quadrant  contains  very  broad  words  related  to  the  theme.  The 
right  quadrant  contains  extremely  narrow  words  related  to  the 
theme.  The  bottom  quadrant  contains  words  related  to  the  theme  of 
the  same  level  of  specificity  as  the  theme  (on  average) . 

Figure  2,  ATMOS  OCEANIC  PHYS,  has  a  null  upper  quadrant 
(typical  of  the  majority  of  clusters  for  the  threshold  values  of 
Equivalence  index  chosen) .  The  left  quadrant,  the  broad  taxonomy 
of  related  areas,  appears  to  describe  two  major  thrusts; 

1.  Underwater  related  (SEA,  INTERNAL  WAVE,  ACOUSTIC, 
SCATTERING)  focusing  on  sound  propagation  through  the  sea. 

2.  Atmosphere  related  (ATMOSPHERE,  RADAR,  SEA  SURFACE, 
SCATTERING)  focusing  on  radar  propagation  through  the  atmosphere. 

The  thrusts  have  a  common  juncture  at  the  sea  surface,  where 
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both  acoustic  and  radar  scattering  occur  on  different  sides. 

The  right  quadrant  focuses  on  very  specific  subareas  related 
primarily  to  acoustics.  These  include  acoustics  applied  to  the 
atmosphere  (RADIOACOUSTIC  SOUNDING) ,  and  other  aspects  of 
atmospheric  science  (THEORY  OF  WIND) . 

The  bottom  quadrant  provides  the  most  balanced  view  of  the  two 
thrusts .  It  expands  on  the  underwater  propagation  medium 
(STRATIFIED  FLUID,  SHEAR  FLOW,  INTERNAL  GRAVITY  WAVES) ,  the  radar 
platform  issues  (SATELLITE,  PROCESSING  OF  RADAR) ,  and  the  ocean 
surface  issues  (WIND  WAVES,  TURBULENT,  OCEAN  SURFACE) .  The 
integrated  picture  presented  by  the  three  quadrants  is  the  use  of 
radar  from  a  space  platform  to  view  the  ocean  surface,  and  the 
research  problems  arising  from  the  wind  and  undersea  flows 
governing  the  conditions  and  structure  of  the  ocean  surface  and 
impacting  the  interpretation  of  the  radar  images. 

CONCLUSIONS  FROM  DATABASE  TOMOGRAPHY  STUDY 

Based  on  the  results  and  interpretation  of  the  multiword 
frequency  analysis  and  the  co-word  analysis,  the  FASAC  database 
used  in  this  study  is  a  compendiiim  of  those  aspects  of  FSU  science 
of  interest  to  the  U.S.  for  strategic  and  military  purposes.  The 
microlevel  analysis  of  selected  theme  clusters,  showing  how  the 
cluster  members  related  to  each  theme,  reinforced  this  conclusion 
and  provided  more  detail  about  those  aspects  of  each  theme  on  which 
FASAC  concentrated. 

A  wealth  of  information  resulted  from  the  FASAC  output,  and 
only  a  small  fraction  of  that  information  was  presented  and 
analyzed  in  this  study.  The  analysis  was  restricted  to  technical 
themes  and  their  relationships.  Raw  data  was  available  for 
relating  technical  themes  to  non-technical  themes  such  as 
institutions,  scientists,  journals,  and  geographical  regions.  - 

In  the  future,  full  text  co-word  analysis  could  be  used  to 
obtain  a  more  representative  structure  of  FSU  (or  any  other 
country's)  science.  If  a  large  number  of  randomly  selected 
published  FSU  scientific  papers  were  entered  into  a  database,  then 
a  multiword  frequency  analysis  and  co-word  analysis  could  be 
performed  on  this  text  database. 

Assume  that  a  paper  represents  about  $100K  worth  of  effort. 
A  10,000  paper  database  would  represent  $1B  worth  of  effort,  and 
would  offer  a  very  representative  sample  of  FSU  science  output. 
The  10,000  paper  database  could  be  analyzed  on  an  existing, advanced 
desktop  computer.  The  critical  path  would  be  assembling  this 
database,  not  analyzing  it. 

Full  text  co-word  analysis  is  in  its  formative  stages.  Much 
development  remains  to  be  done  to  understand  the  breadth  of 
analyses  which  can  be  performed  and  the  breadth  of  applications 
which  can  be  covered. 

NETWORK  MODELING  FOR  DIRECT/ INDIRECT  IMPACTS 


Background 
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In  a  mission-oriented  research-sponsoring  organization,  the 
selection  and  continuation  of  research  programs  must  be  made  on  the 
basis  of  outstanding  science  and  potential  contribution  to  the 
organization's  mission.  There  have  been  increasing  pressures  to 
link  science  and  technology  programs  and  goals  more  closely  and 
clearly  to  organizational  as  well  as  broader  societal  goals 
[Carnegie,  1992].  The  process  of  estimating  potential  impact  of 
research,  especially  basic  research,  on  organizational  and  societal 
goals  is  complex  due  to  the  myriad  of  pathways  by  which  the 
research  product  can  effect  its  impact. 

Most  resource-allocation  methods  in  the  literature  that 
incorporate  organizational  objectives  tend  to  be  q[ualitative  when 
addressing  basic  research,  and  more  quantitative  when  addressing 
applied  research  allocation. 

-(See  Logsdon  [1985],  OTA  [1986],  Hall  [1990],  IEEE  [1974, 
1983],  Baker  [1964],  Cetron  [1967],  Datz  [1974],  Baker  [1974, 
1975],  Winkofsky  [1980]  for  reviews  which  compare  selection  methods 
and  sort  these  methods  into  categories  or  classes; 

-see  Kostoff  [1983],  Hazelrigg  [1982],  Helin  [1974],  Souder 
[1978],  Cook  [1982],  Nutt  [1965],  Souder  [1975],  Van  de  Ven  [1971], 
Plebani  [1981],  Mottley  [1959],  Garguilo  [1981],  Gear  [1971],  Pound 
[1964],  Dean  [1965],  Moore  [1969],  Gustafson  [1971],  McGuire 
[1973],  Paolini  [1977],  Cooper  [1978],  Ramsey  [1978],  Krawiec 
[1984],  Gear  [1974],  Keefer  [1978],  Madey  [1985],  Liberatore 
[1987],  Dean  [1962],  Cramer  [1964],  Vanston  [1977],  Bell  [1967], 
Cochran  [1971],  Themelis  [1976],  Aaker  [1978],  Liberatore  [1981], 
Silverman  [1981],  Menke  [1983],  Ellis  [1984],  Hertz  [1964],  Hespos 
[1965],  Maher  [1974],  Schwartz  [1977]  for  benefit  measurement 
methods  [develop  quantitative  measures  of  the  benefit  of  performing 
an  R&D  project,  then  select  those  projects  which  provide  greatest 
benefit]  as  defined  in  Hall  [1990]; 

-see  Watters  [1967],  Asher  [1962],  Beged  Dov  [1965],  Baker 
[1969],  Souder  [1973],  Keown  [1979],  Winkofsky  [1981],  Taylor 
[1982],  Hess  [1962],  Rosen  [1965],  Atkinson  [1969]  for  constrained 
optimization  approaches  [optimize  some  objective  function  subject 
to  specified  resource  constraints]  as  defined  in  Hall  [1990]; 

-see  Cooper  [1981],  Stahl  [1983],  Lockett  [1970],  Mandakovic 
[1985]  for  cognitive  emulation  models  [establish  an  actual  model  of 
the  decision  making  process  within  an  organization]  as  defined  in 
Hall  [1990]) 

Almost  all  of  the  allocation  techniques  in  the  literature  are 
more  appropriate  for  the  applied  research,  or  development, 
projects.  Use  of  R&D  project  selection  models  falls  into  three 
categories  [Roessner,  1985] : 

1.  A  decision  maker  was  influenced  on  a  particular  decision  by 
the  findings  of  a  specific  piece  of  research  ( instrxunental  use) ; 

2.  A  decision  maker  finds  that  a  piece  of  research  contains 
ideas  or  information  that  contribute  to  the  work  of  his/her 
organization  (conceptual  use) ; 

3.  A  decision  maker  uses  research  to  advance  his/her  own  self- 
interest  (partisan  use) . 
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Whether  these  allocation  techniques  are  categorized  according 
to  OTA  [1986]  (scoring  models,  economic  models,  constrained 
optimization  models,  risk  analysis  models),  or  -categorized 
according  to  Hall  [1990]  (constrained  optimization  methods,  benefit 
measurement  methods,  cognitive  emulation  models,  ad  hoc  methods, 
surveys)  these  techniques  require,  in  practice,  a  project's 
development  and  payoff  characteristics.  These  characteristics  can 
be  estimated  when  a  project's  downstream  development  phase  can  be 
identified,  such  as  for  some  types  of  applied  research,  and  for 
many  types  of  development  projects.  For  many  areas  of  basic 
research,  development  and  payoff  characteristics  are  not  obvious. 
There  do  not  appear  to  be  viable  quantitative  resource  allocation 
models  applicable  to  basic  research. 

This  section  discusses  a  network  based  modeling  approach  which 
would  allow  estimation  of  the  direct  and  indirect  impacts  of  a 
research  program  or  collection  of  research  programs.  The  research 
program  impacts  would  be  multi-faceted,  including  impacts  on 
advancing  its  own  field,  on  advancing  allied  fields,  on  advancing 
technology,  on  supporting  operations  and  mission  requirements,  etc. 
The  model  proposed  here  differs  from  any  reported  in  the  literature 
in  that  it  reflects  more  accurately  the  different  types  of  impact 
which  basic  research  generates.  A  major  feature  of  the  model  is 
inclusion  of  feedback  from  the  higher  development  categories  (e.g., 
exploratory  development,  advanced  development)  on  the  advancement 
of  research. 

Philosophy  of  Proposed  Network  Approach 

Existing  matrix-based  research  impact  models  [Dean,  1972; 
Ibrahim,  1984])  are  most  useful  for  applied  R&D  concepts  and 
utilize  a  vertical  impact  structure  (forward  diffusion  .  of 
knowledge)  where  the  impacts  of  research  flow  forward  only  to  the 

more  advanced  development  categories  (e.g.,  research - > 

development - >  systems) .  The  proposed  model  uses  a  structure  of 

lateral  and  backward  diffusion  of  knowledge  superimposed  on  the 

vertical  impact  structure  (e.g.,  research - >  research - > 

development - >  research - ->  development - >  systems) .  The 

proposed  model  accounts  for  the  upward  impacts  of  research  (forward 
diffusion)  allowed  by  the  present  models.  It  also  allows  one 
research  field  to  impact  another  research  field  (lateral  diffusion) 
and  allows  the  higher  development  categories  to  impact  research  as 
well  (backward  diffusion) . 

For  example,  a  matrix  model  approach  could  have  a  vertical 
impact  structure  path  consisting  of  Physics  (research)  impacting 
Lasers  (technology)  impacting  Beam  Weapons  (systems)  .  The  proposed 
network  model  would  include  this  path,  but  many  others  as  well, 
including  Physics  (research)  impacting  Lasers  (technology) 
impacting  nanoelectronics  (research)  impacting  Controls 
(technology)  impacting  Beam  Weapons  (systems) ,  and  including 
Physics  (research)  impacting  Lasers  (technology)  impacting  Fluid 
Flow  Visualization  (research)  impacting  Helicopter  Blade  Design 
(technology)  impacting  Helicopters  (systems) . 
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The  impact  of  much  basic  research,  especially  on  the  higher 
development  categories  such  as  systems  development,  proceeds 
through  many  indirect  paths.  A  quantitative  model  of  impact  should 
have  the  capability  of  identifying  the  paths  along  which  impact 
occurs  and  quantifying  the  impact  along  as  many  paths  as  is 
possible.  The  existing  forward  diffusion  matrix-based  models  are 
severely  constrained  on  the  number  and  types  of  paths  along  which 
impact  occurs.  These  models  are  not  able  to  account  for  impact 
along  lateral  diffusion  paths  (e.g.,  research-research)  or  along 
backward  diffusion  paths  (e.g.,  technology-research).  The  proposed 
model  allows  impact  to  occur  along  any  of  these  paths,  and  thus 
includes  many  types  of  indirect  impacts  as  well  as  direct  impact. 

Example;  Differences  between  Matrix  and  Network  Approaches 

A  simple  example  will  show  the  difference  in  breadth  of  impact 
allowed  between  the  proposed  model  and  a  leading  existing  matrix- 
based  model  [Dean,  1972].  Assume  it  is  desired  to  compute  the 
impact  of  a  research  project  R  on  a  technology  project  T.  In  the 
standard  methodology,  it  is  only  necessary  to  examine  ONE  path  from 
R  to  T.  This  is  the  path  of  direct  impact,  and  the  value  of  the 
impact  is  the  value  of  the  matrix  element  RT. 

In  the  proposed  methodology,  R  and  T  are  two  nodes  in  a  fully 
connected  network.  All  possible  paths  between  R  and  T  are  examined 
when  computing  the  total  impact  of  R  on  T.  Thus,  the  overwhelming 
majority  of  paths  which  contribute  to  the  total  impact  of  R  on  T 
are  the  indirect  impact  paths.  The  total  impact  of  R  on  T  is  the 
sum  of  the  link  value  products  along  EVERY  path  connecting  R  to  T. 
Continuing  the  example  above,  R  could  be  the  Physics  research  node 
and  T  could  be  the  Laser  technology  node.  In  the  standard  matrix 
approach,  only  the  direct  impact  of  Physics  on  Lasers  is 
considered.  In  the  proposed  methodology,  additional  paths  between 
Physics  and  Lasers,  such  as  Physics  impacting  Fluid  Dynamics 
research  impacting  Lasers  or  Physics  impacting  Solid  State 
Materials  research  impacting  Lasers,  would  also  be  considered. 

For  a  graph  with  a  large  number  of  nodes  N,  there  are 
approximately  e*m!  paths  (ranging  in  length  from  1  to  N-1  links) 
connecting  R  to  T,  where  m  is  N-2.  In  the  pilot  study  performed  to 
test  the  validity  of  the  proposed  model  and  overviewed  in  this 
Handbook,  the  graph  that  was  used  consisted  of  15  research  nodes 
and  27  technology  nodes.  For  the  pilot  study  graph,  e*m!  is 
approximately  10  to  the  47th  power.  In  this  simple  example  based 
on  the  small  pilot  study  grid,  the  proposed  method  could 
theoretically  examine  link  value  products  along  47  orders  of 
magnitude  more  paths  than  does  the  standard  method.  In  the  actual 
pilot  study,  link  value  products  were  computed  along  all  paths  five 
links  or  less  in  length.  This  means  that  approximately  m^4,  or  2.5 
million  paths  connecting  R  to  T,  were  examined.  This  same  order  of 
magnitude  differential  holds  between  the  proposed  method  and  the 
other  matrix-based  methods  which  were  examined  before  the  proposed 
method  was  devised. 

Of  equal  importance  to  the  quantitative  difference  between  the 
two  methods  is  the  qualitative  difference.  The  proposed  approach 
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allows  full  weight  to  be  given  to  those  research  projects  which 
have  large  indirect  impacts.  Many  of  the  fundamental  research 
areas,  such  as  Mathematics,  Physics,  etc.,  have  substantial  impacts 
on  other  research  areas  (as  well  as  technologies) ,  and  these 
indirect  impacts  are  not  fully  captured  in  the  matrix-based 
methods.  Since  the  fundamental  research  areas  tend  to  have 
indirect  impact  on  many  research  and  technology  areas,  when  the 
impact  is  summed  over  all  research  and  technology  areas,  the  total 
impact  of  these  fundeu&ental  research  areas  becomes  substantial. 

For  any  organization  with  a  substantial  fraction  of  its  budget 
in  these  fundamental  research  areas,  a  method  that  is  able  to 
capture  the  sizeable  indirect  impacts  of  basic  research  is 
important.  For  an  advanced  technology  development  organization, 
where  the  impacts  of  the  work  are  more  focused  to  specific 
technologies  and  requirements,  the  benefits  of  the  proposed 
multipath  approach  may  be  less  (although  they  will  always  be 
greater  than  those  of  the  matrix  approaches,  since  the  proposed 
method  includes  all  the  paths  in  the  matrix  approach  and  others) . 

The  remainder  of  this  section  describes  the  proposed  method, 
an  overview  of  the  preliminary  pilot  study  that  was  performed  to 
test  the  feasibility  of  the  method,  key  lessons  learned  from  the 
pilot  study,  and  recommendations  for  an  enhanced  study. 

METHODOLOGY 

Creating  Domains  and  Forming  the  Network 

The  research  impact  quantification  methodology  presented  here 
displays  the  value  of  a  given  research  program  to  advancing  its  own 
field,  to  supporting  other  research  areas,  to  supporting 
technology,  and  to  supporting  mission  requirements.  The  first  step 
in  the  methodology  is  defining  a  domain  of  potential  impacts.  For 
example,  if  the  impact  of  research  on  other  research,  technology, 
and  systems  is  desired,  then  the  three-level  domain  for  the  model 
would  be  research,  technology,  and  systems.  Each  of  these  levels 
is  subdivided  further  into  a  number  of  categories. 

As  a  specific  example,  in  the  two-level  domain  (research, 
technology)  pilot  study  that  will  be  overviewed,  research  was 
divided  into  15  categories  (math,  physics,  chemistry,  etc.)  and 
technology  was  divided  into  27  categories  (training,  navigation, 
countermeasures,  etc.)  .  These  categories  had  the  property  of  being 
relatively  non-overlapping,  and  were  similar  to  categories  being 
used  by  the  Navy  for  management  purposes  at  the  time  of  the  study. 
All  42  categories  are  represented  as  nodes  in  a  network. 

Since  it  is  assumed  that  research,  technology,  and  missions 
are  interlocked  and  have  mutual  impacts  with  different  strengths  of 
connectivity,  each  pair  of  categories  (nodes)  can  be  visualized  as 
connected  with  a  line  (link)  .  This  schematic  has  the  form  of  a 
graph,  or  network  in  which  all  node  pairs  are  connected.  The 
lines,  or  links,  which  connect  each  pair  of  nodes,  are  allowed  to 
have  two  values,  depending  on  direction  between  the  nodes.  This 
allows  any  research,  technology,  or  missions  area  at  the  lowest 
category  breakdown  level  to  impact  any  other  research,  technology. 
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or  missions  area  with  a  specified  strength. 

Since  one  of  the  desired  outputs  of  the  proposed  procedure  is 
impact  of  research,  and  since  research,  technology,  and  missions 
are  assumed  to  have  mutual  impacts,  then  the  generic  computational 
problem  is  to  obtain  the  impact  of  one  node  of  the  network  on  any 
other  node  in  the  network.  Three  interrelated  types  of  impact 
(DIRECT  IMPACT,  IMPACT,  TOTAL  IMPACT)  Of  one  node  on  any  other  node 
will  now  be  described. 

In  this  multi-node  network,  assume  'a'  is  one  node,  'b'  is  a 
second  node,  and  'x'  is  a  third  node.  The  DIRECT  IMPACT  of  node 
'a'  on  node  'b',  or  more  specifically,  the  direct  importance  of 
results  from  node  'a'  to  the  achievement  of  objectives  of  node  'b', 
is  the  value  (L  ab)  of  the  link  directed  from  node  'a'  to  node  'b'. 
Thus,  if  'a'  represents  a  research  node  (partial  differential 
equations,  for  example),  and  'b'  represents  a  technology  node 
(short  wavelength  lasers,  for  excimple) ,  then  (L  ab)  would  represent 
the  direct  importance  (or  DIRECT  IMPACT)  of  research  results  in 
partial  differential  equations  to  the  achievement  of  development 
objectives  of  short  wavelength  lasers.  The  scale  of  (L  ab)  ranges 
from  0%  importance,  which  means  results  from  node  'a'  have  no 
impact  on  achievement  of  objectives  of  node  'b',  to  100  % 
importance,  which  means  results  from  node  'a*  are  absolutely 
crucial  to  the  achievement  of  objectives  of  node  'b*. 

The  IMPACT  of  node  'a'  on  node  *b',  along  any  multi-link  path 
connecting  node  'a'  to  node  'b',  is  defined  as  the  product  of  the 
link  values  (DIRECT  IMPACTS)  along  the  path.  On  the  two  link  path 
'a'-'x',  'x'-'b',  the  IMPACT  is  the  product  (L  ax  *  L  xb)  .  Thus,  if 
results  from  work  in  node  'a'  are  25%  important  to  obtaining 
objectives  in  node  'x',  and  results  from  work  in  node  'x'  are  25% 
important  to  obtaining  objectives  in  node  'b',  then  the  IMPACT  of 
node  'a'  on  node  'b'  along  the  two  link  path  'a*-'x',  'x'-*b'.  is 
6%.  Other  functions  to  represent  IMPACT  along  the  multi-link  path 
could  be  defined,  but  the  product  of  link  values  appears  to  be 
simplest  and  easiest  intuitively  to  relate  to  reality. 

The  TOTAL  IMPACT  of  node  ’a'  on  node  'b'  is  defined  as  the  sum 
of  the  IMPACTS  along  every  path  connecting  node  'a'  to  node  'b'  and 
is  the  main  figure  of  merit  used  in  the  present  study.  The 
computational  problem  for  obtaining  TOTAL  IMPACT  of  node  'a'  on 
node  'b',  then,  is  to  trace  each  path  from  node  'a'  to  node  'b', 
compute  the  link  value  products  along  each  path  to  obtain  the 
IMPACT  of  'a'  on  'b'  along  the  path,  and  sum  the  IMPACTS  over  all 
the  paths  connecting  node  *a'  to  node  'b'.  To  eliminate  double 
counting,  and  to  insure  that  the  IMPACT  of  node  'a'  on  node  'b' 
decreases  as  more  links  are  added  to  the  particular  path  connecting 
node  'a*  to  node  'b',  the  values  of  all  the  links  coming  into  node 
'b'  should  not  exceed  unity. 

Normalizing  Link  Values 

This  condition  is  incorporated  into  the  computational  process 
by  using  a  normalized  value  for  each  link  value  in  place  of  the 
value  provided  by  the  data  source;  i.  e.,  L'  ij  =  Lij  *  (1-L 
jj)/SUM  (L  ij)  where  L  ij  is  the  data  source  link  value,  L'  ij  is 
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the  normalized  link  value,  L  jj  represents  the  fraction  of  the 
objectives  within  node  'j'  that  can  be  achieved  without  input  of 
results  from  any  other  nodes  in  the  network,  and  the  sum  is  taken 
over  all  the  links  coming  into  node  'j*.  The  equations  without 
further  constraints  allow  loops  to  exist  in  the  network.  For 
example,  a  three  link  path  between  node  'a'  (Math)  and  node  'b' 
(Lasers)  could  be  node  'a'  to  node  'x'  (Physics),  node  'x*  to  node 
'a',  and  node  'a'  to  node  'b'.  While  this  would  be  viewed  as  double 
counting  if  it  were  to  occur  at  one  point  in  time,  it  is  perfectly 
valid  when  these  steps  among  nodes  occur  at  different  times.  Thus, 
the  IMPACT  of  node  'a'  on  node  'b'  has  to  be  interpreted  as  a 
cumulative  impact  over  time  and  is  a  function  of  the  length  of  the 
path  from  node  'a'  to  node  ’b'.  An  exact  solution  for  the  IMPACT 
would  therefore  require  link  values  for  every  step  in  time  from  the 
present  to  the  computational  time  horizon.  Further,  each  of  these 
link  values  could  not  be  obtained  independently,  but  would  require 
knowledge  of  the  link  values  connecting  all  the  nodes  at  the 
previous  time  step,  since  progress  in  any  one  node  is  assumed  to 
depend  on  previous  progress  in  all  of  research  and  technology.  To 
keep  the  computational  and  data  generation  problem  manageable,  an 
approximate  solution  is  obtained  by  treating  the  link  values  as 
constants  rather  than  functions  of  time,  and  interpreting  and 
providing  the  link  values  as  time-averaged  quantities.  Without 
knowledge  of  the  variation  of  the  link  values  with  time,  a  credible 
estimation  of  the  error  resulting  from  the  constant  link  value 
assumption  cannot  be  made. 

PILOT  STUDY  OVERVIEW 

Taxonomy  Used 

It  was  the  author's  intent  to  identify  the  pathways  through 
which  research  programs  could  impact  technology  areas  and 
eventually  naval  and  other  application  or  mission  areas.  In 
parallel,  some  quantification  of  the  impact  of  these  programs  was 
desired.  A  complete  study  would  have  required  hundreds  of  nodes, 
many  experts  or  other  sources  of  the  raw  link  value  input  data,  and 
large  amounts  of  data  handling  and  entry.  As  a  first  step,  to  test 
the  feasibility  of  the  overall  method,  a  small-scale  pilot  study 
was  performed.  Research  and  technology  levels  were  included  in  the 
computational  network;  missions  were  not  included.  The  final 
research  taxonomy  selected  for  the  study  was  identical  to  the 
categorization  which  the  Office  of  Naval  Research  used  for  research 
management  purposes  at  the  time  of  the  study.  The  final  technology 
taxonomy  selected  for  the  study  was  similar  to  functional  element 
breakdowns  used  in  the  past  by  Navy  exploratory  development 
programs  for  management  purposes.  These  two  taxonomies  had  the 
virtue  of  being  fairly  comprehensive  in  their  coverage,  at  least  as 
far  as  the  Navy  is  concerned,  and  there  were  in-house  experts 
available  to  provide  preliminary  link  value  data  for  each  of  the 
subcategories  in  these  taxonomies.  Of  necessity,  the  taxonomy 
elements  used  were  very  broad.  Each  research  taxonomy  element 
(e.g..  Mechanics)  contained  a  number  of  different  research  programs 


138 


(e.g.,  Solid  Mechanics,  Fluid  Mechanics,  Energy  Conversion),  which 
themselves  could  have  been  divided  into  subprograms. 

Data  Acquisition 

The  data  was  obtained  by  personal  interview.  Each  in-house 
expert  was  provided  with  a  list  of  the  42  research  and  technology 
nodes,  and  was  asked  to  estimate  the  importance  of  results  produced 
from  all  the  other  nodes  on  his  particular  node  of  expertise.  The 
expert  was  asked  to  provide  a  number  which  served  as  a  measure  of 
impact  based  on  the  following  scoring  scale:  Crucial (10);  Very 
Important ( 8 ) ;  Important (6) ;  Moderately  Important (4 ) ;  Slightly 
Important(2) ;  Negligible(O) .  Definitional  uncertainties  were 
minimized  due  to  the  presence  of  the  interviewer. 

Because  the  approach  is  based  on  siabjective  judgement,  there 
are  limitations  to  the  validity  of  the  data,  especially  with  the 
small  numbers  of  experts  per  node  that  were  employed.  There  was  no 
attempt  made  to  normalize  the  responses,  and  an  impact  that  one 
expert  labeled  Important  could  have  been  labeled  Moderately 
Important  by  another  expert.  There  was  no  attempt  to  gauge  the 
degree  of  expertise  of  each  respondent  relative  to  his  field  of 
expertise,  and  the  numerical  ratings  supplied,  therefore,  carry 
different  degrees  of  validity.  Because  of  the  broad  discipline 
coverage  of  each  node,  the  expertise  of  any  respondent  relative  to 
the  breadth  of  the  discipline  was  quite  limited.  Use  of  a  small 
number  of  experts  per  node  did  not  provide  a  good  statistical 
representation  of  how  each  technical  community  would  have  perceived 
impact  on  its  discipline. 

Because  of  the  rapid  convergence  of  the  link  fractional  value 
multiplication  process,  it  was  found  that  timely  and  accurate 
results  could  be  obtained  with  networks  whose  longest  paths  were 
three  links  in  length.  Including  a  fourth  link  made  only  a  very  few 
percent  difference  in  the  results. 

Lessons  Learned  from  Pilot  Study 

The  results  from  the  pilot  study  are  described  in  detail  in 
Kostoff  [1994i].  The  lessons  learned  from  the  pilot  study  will  now 
be  described.  The  pilot  study  was  limited  by  a  number  of  factors, 
especially  the  broad  coverage  of  each  node.  To  expand  the  scope  and 
capabilities  of  the  study  methodology  to  the  point  where  study 
results  could  support  credibly  the  prioritization  of  research  areas 
and  produce  a  more  evidentiary  basis  for  establishing,  program 
balance,  the  following  steps  would  be  required  at  a  minimum. 

First,  the  research  and  technology  nodes  need  to  be  subdivided 
to  improve  resolution.  The  second  major  improvement  required  over 
the  pilot  study  is  the  addition  of  missions  nodes  to  the  network. 
The  third  improvement  is  that  research,  technology,  and  missions 
taxonomies  need  to  be  orthogonal i zed  better,  so  that  overlaps  among 
nodes  and  resultant  skewing  of  the  results  are  minimized.  Fourth, 
the  number  and  range  of  experts  per  node  need  to  be  expanded  to 
provide  more  node  representative  than  the  one  or  two  experts  per 
node  provided  in  the  pilot  study.  The  fifth  improvement  is  that 
the  written  material  supplied  to  the  respondents  needs  to  be 
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sharpened,  especially  in  the  absence  of  an  interviewer. 

Operational  Value  of  Present  Approach 

The  final  issue  in  this  section  addresses  the  operational 
value  of  the  present  approach.  When  the  pilot  study  was  proposed, 
the  type  and  significance  of  results  finally  obtained  were  never 
expected.  As  the  study  proceeded,  much  information  about  the 
interlocking  nature  of  research  and  technology  was  obtained  in 
addition  to  that  provided  on  the  questionnaires.  Thus,  much  of  the 
study's  value  derived  from  the  performance  of  the  study,  and 
additional  study  benefits  would  be  expected  from  a  refined  study. 

From  another  perspective,  a  refined  study  could  serve  as  a 
total  program  assessment.  It  could  identify  gaps,  duplications, 
promising  research  areas,  and  funding  priorities  for  the  total 
program  taken  as  a  whole.  The  typical  technical  assessment 
performed  today  focuses  on  a  technology  or  research  area,  and 
defines  required  research  to  allow  attainment  of  technology  and 
mission  objectives.  However,  in  the  zero-sum  game  environment  of 
finite  resource  constraints,  money  to  fund  the  required  research 
identified  by  the  assessment  has  to  be  taken  away  from  proposed  or 
existing  research  in  some  other  area.  Unless  the  total  impact  of 
unfunding  this  other  research  can  be  identified,  it  is  not  clear 
whether  the  overall  research  program  would  benefit  by  funding  that 
research  identified  by  the  technology  assessment.  In  fact,  it  is 
evident  that  unless  all  technology  and  research  are  assessed 
simultaneously,  funding  reallocations  based  on  one  or  two  specific 
technology  assessments  could  be  highly  suboptimal  and  misleading 
and  could  affect  the  overall  research  program  adversely.  A  refined 
study  could  serve  as  a  total  research  and  technology  assessment, 
performed  at  the  project  level,  and  may  perhaps  be  the  only 
sensible  way  to  perform  a  technical  assessment. 

NETWORK  MODELING  FOR  ROADMAPS 

This  section  includes  contributions  from  MR.  ROBERT  J.  ZURCHER 
AND  DR.  RONALD  N.  KOSTOFF. 

Introduction 

One  of  the  motivations  for  research  assessment  and  evaluation 
studies  is  to  gain  a  better  understanding  of  the  potential  myriad 
impacts  of  the  research,  and  then  use  this  understanding  to  help 
accelerate  the  transition  of  the  research  to  useful  technology. 
Accelerating  the  conversion  of  science  to  technology  has  three 
essential  elements:  1)  Information  about  the  science  must  exist  and 
be  readily  available  to  potential  users;  2)  The  need  for  the 
converted  science  (technology)  must  exist;  3)  One  or  more 
entrepreneurs  who  recognize  the  need,  who  understand  the 
relationship  between  the  need  and  the  science,  and  who  are  willing 
to  obtain  the  necessary  resources  and  accept  the  risks  inherent  in 
further  development  of  the  science,  must  be  available  to  champion 
its  further  development. 

Large  databases,  which  describe  ongoing  and  completed 
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research,  are  commercially  available  (e.g.,  journal  paper 
abstracts,  federal  project  and  program  narratives) .  With  global 
competition  for  markets,  the  need  for  new  technology  ha-s  never  been 
greater,  and  many  compendia  of  projected  technology  requirements 
are  available  (National  Academy  of  Science/Engineering  Studies, 
Agency  Requirements  Docviments,  etc.)* 

However,  availability  of  research  and  requirements  information 
is  not  sufficient  to  motivate  potential  entrepreneurs  to  invest 
time  and  other  resources  in  the  high  risk  research  conversion 
process.  Investors  must  be  convinced  that  the  considerable  front- 
end  risk  of  science  conversion  is  more  than  justified  by  the 
potential  payoff.  Placement  of  the  science  conversion  step  into 
the  larger  pathway  from  research  to  high-payoff  applications  is  a 
key  component  for  eliciting  investor  interest.  While  relatively 
large  resources  have  supported  the  development  of  the  research 
databases,  and  substantial  study  efforts  and  market  surveys  have 
contributed  to  the  volumes  of  existing  requirements,  relatively  few 
efforts  have  focused  on  fusing  together  requirements  with  research 
systematically. 

There  are  fundamental  reasons  why  little  progress  has  been 
made  on  methodologies  to  identify  the  characteristics  of  these 
linkages.  The  pathways  between  research  and  eventual  applications 
are  many,  are  not  necessarily  linear,  and  require  significant 
amounts  of  data  [Kostoff,  1994i;  previous  section  on  network 
modeling] .  Substantial  time  and  effort  are  required  to  portray 
these  links  as  accurately  as  possible,  and  substantial  thought  is 
necessary  to  articulate  and  portray  this  massive  amount  of  data  in 
a  form  comprehensible  to  potential  investors.  Recently,  desktop 
high  speed  computers  with  large  storage  capabilities,  intelligent 
algorithms  for  manipulating  data,  and  other  tools  have  become 
available  to  allow  these  research-capabilities  pathways  (roadmaps) 
to  be  constructed  and  portrayed  efficiently  and  effectively,  and  to 
be  used  as  a  basis  for  more  detailed  analysis. 

The  main  value  of  these  decision  aids,  or  roadmaps,  in  the 
science  conversion  process  is  to  promote,  at  all  phases  of  the 
roadmap  development  process,  champion/  investor  interest  in 
developing  the  research  further.  In  planning  the  roadmap,  thought 
has  to  be  given  to  all  its  structural  elements,  including  the 
extent  of  the  development  required,  any  trade-offs  or  opportunities 
lost,  and  potential  costs  and  payoffs.  In  building  the  roadmap, 
experts  in  the  different  levels  of  development  and  payoff  become 
involved,  and  the  risks,  potential  costs  and  benefits  are  clarified 
further.  When  the  completed  roadmap  is  distributed  to  interested 
parties,  decisions  to  pursue  the  science  conversion  can  be  made 
with  greater  understanding  of  the  larger  development  context.  For 
a  more  comprehensive  discussion  of  roadmaps,  see  Science  and 
Technology  Roadmaps  [Kostoff,  1997p]  on  the  Internet. 

Retrospective  studies  of  successful  innovation  have  shown  that 
at  least  one  champion  is  required  to  insure  continuity  and 
persistence  toward  the  final  goal  [Kostoff,  1997j].  Other  studies 
have  shown  that  two  champions  are  preferable,  one  from  the 
technology-push  side  and  the  other  from  the  requirements-pull  side 
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[Rubenstein,  1997].  In  reality,  there  are  at  least  three  major 
parameters  which  govern  the  role  and  impact  of  champions  on  the 
science  conversion  process.  The  first  is  numbers:  the  more 
champions,  the  more  likely  is  the  conversion  process  support.  The 
second  is  intensity:  the  more  intense  the  interest  and  persistence 
of  the  champion (s)  ,  the  more  likely  is  the  research  to  proceed. 
The  third  is  influence:  the  greater  the  influence  of  the 
champion (s),  the  more  likely  are  the  chances  that  the  research 
conversion  will  be  pursued. 

Having  potential  champions  involved  in  the  planning, 
developing,  and  distribution  of  the  roadmap  improves  the  likelihood 
of  numbers,  intensity,  and  influence  of  champions  being  increased 
if  analysis  of  the  roadmap  shows  downstream  potential  for 
substantial  payoff.  If  roadmap  analysis  does  not  show  convincing 
evidence  of  payoff  of  the  research  toward  the  objectives,  either 
due  to  intrinsic  lack  of  potential  payoff  or  to  unawareness  of 
payoff  of  those  constructing  the  roadmaps,  then  the  research  may 
not  proceed  further.  If  the  roadmap  analysis  shows  high  potential 
payoff,  but  with  extremely  high  front-end  risk  and  costs,  then  the 
type  of  champion  interest  may  be  limited  to  government  for  the 
initial  risk-lowering  development  phases. 

This  section  overviews  the  algorithmic  component  and  analytic 
potential  of  the  Graphical  Modeling  System  (GMS) ,  a  computer-based 
process  for  generating  and  analyzing  roadmaps  which  link  research 
to  technology  and  eventually  to  capabilities/requirements.  This 
process  has  been  under  development  for  the  past  five  years 
[Zurcher,  1997],  and  its  algorithmic  component  is  based  on  a 
directed  graph/  network  model  of  research/ technology/capabilities/ 
requirements.  It  uses  the  latest  relational  database/  hypertext 
technology  to  identify  the  potential  pathways  which  link  research 
to  higher  development  categories  and  specific  requirements/  targets 
of  interest.  The  algorithmic  component  presently  resides  on  a  PC 
and  requires  l6Mb  ram,  5Mb  of  disk  storage,  and  a  minimum  of  ^Mb  of 
disk  storage  for  an  uncomplicated  technology  roadmap. 

In  the  past,  many  methods  have  been  developed  to  select  or 
evaluate  R&D  projects  [Fahrni,  1990;  Cooley,  1986;  Jackson,  1983; 
also  see  references  in  previous  section  on  Network  Modeling] . 
These  methods  typically  use  simple  checklists,  scoring, 
cost/benefit  analysis,  mathematical  programming  or  decision  trees 
to  determine  future  value  from  a  current  investment.  Other  methods 
describe  the  value  of  R&D  projects  by  attempting  to  measure  the 
effectiveness  of  transfers  of  technology  [Spann,  1995]  without 
explicitly  taking  into  account  customer  requirements.  Some 
algorithms  link  research  programs  to  end  uses/  capabilities/ 
requirements  [Thomas,  1996;  Barker,  1995].  This  last  method  1) 
creates  a  context  within  which  technology  projects  exist,  2) 
requires  a  flexible  technology  assessment  methodology  since 
requirements  change  and  emerging  technologies  will  modify  current 
plans,  and  3)  demands  continual  dialog  between  customers  and 
developers.  As  shown  in  the  previous  section  on  network  modeling, 
in  the  classical  matrix  approach  [Dean,  1972],  impacts  flow 
monotonically  upward  in  the  development  chain  (research  — > 
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technology  — >  capabilities  — >  requirements/ end  targets) ,  and  in 
the  network/  directed  graph  approach  [Kostoff,  1994i],-  impacts  are 
allowed  to  flow  upward,  downward,  or  laterally  in  the  development 
chain  (e.g.,  research  — >  technology  — >  research  — >  research  — > 
technology  — >  capabilities)  .  GMS  is  able  to  show  the  node-link 
relationships  of  both  the  matrix  and  network  approaches  (where  a 
research  or  technology  project,  or  a  capability,  is  treated  as  a 
node  in  a  network,  and  the  impact  of  one  project  [node]  on  another 
project  [node]  is  portrayed  as  a  quantified  link  in  the  network) . 

In  addition,  GMS  adds  a  crucial  new  capability,  termed 
Multiple  Perspectives  (MP) .  In  GMS,  the  nodes  (projects/ 
capabilities/  requirements)  are  treated  as  multi-valued  (multi- 
attributed)  quantities,  and  are  allowed  to  exist  in  many  different 
research-requirement  pathways  simultaneously.  This  MP  capability 
provides  a  more  accurate  depiction  of  the  multi-application  nature 
of  most  research  and  technology.  The  user  of  GMS  is  now  able  to 
highlight  only  the  specific  node-link  subnetworks  of  interest  (the 
desired  research-requirement  pathways)  without  being  overwhelmed  by 
the  massive  data  which  constitutes  the  larger  network. 

For  example,  the  MP  capability  enables  the  user  to  select 
research-requirements  pathways  to  view  (e.g.,  ‘top-down’ 
requirements  perspectives,  or  *  bottom-up’  science/ technology 
perspective  rather  than  viewing  all,  potentially  complicating, 
nodes  and  links,  or  having  a  static  display  that  can  not  change) . 
Researchers  can  1)  observe  the  larger  context  in  which  their  work 
is  being  performed,  or  2)  identify  new  applications’  targets  for 
their  research,  and  make  informed  decisions  on  how  to  proceed  to 
maximize  payoff  for  multiple  applications.  Also,  it  allows  the 
user  and  other  interested  parties  to  identify  the  research  and 
technology  projects  which  presently  serve  as  obstacles  to  reaching 
desired  applications’  targets  in  a  timely  manner. 

Methodology 

The  roadmap,  or  graphical  model,  overviewed  here  is  a 
selected  set  of  requirements,  links  and  R&D  projects  that 
describes  the  state  of  technology  development  and  potential 
transfer  in  a  coherent  area.  It  could  be  composed  of  a  single 
requirement  for  a  system  linked  to  corresponding  R&D  projects,  or 
it  could  encompass  multiple  requirements  linked  to  numerous 
projects.  A  graphical  model  visually  portrays;  requirem^ts, 
capabilities,  R&D  projects  in  different  development  phases; 
relationships  between  R&D  projects  and  requirements;  and 
integration  among  related  R&D  projects. 

The  GMS  depiction  of  the  science  conversion  process  is 
assembled  in  a  two-stage  process:  1)  Construction  of  a  graphical 
model;  2)  Analysis  of  the  pathway  elements  between  requirements 
and  R&D  projects. 

a.  Model  Construction: 

Model  construction  consists  of  identifying  the  projects  and 
requirements  (nodes)  for  the  roadmap,  then  identifying  the 
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relationships  (links)  between  the  projects  and  requirements. 

Step  1:  Identifying  Types  of  Projects  and  Requirements 

R&D  projects  and  requirements  are  partitioned  according  to 
the  phase  of  development  of  the  R&D  projects  and  to  the  level  of 
specificity  of  the  requirements.  While  the  actual  graphical 
models  used  employ  a  half-dozen  or  more  bands  for  subdividing 
project  and  requirement  types,  for  purposes  of  demonstration 
simplicity  the  roadmaps  shown  in  Zurcher  [1997]  have  four  levels: 
research,  development,  capability,  requirements. 

Constructing  the  roadmap  framework  (i.e.,  identifying  the 
specific  nodes  to  be  used  in  the  roadmap  and  the  placement  of 
those  nodes  at  the  appropriate  level  of  development)  is  perhaps 
the  most  challenging  step  in  the  roadmap  development  process.  It 
is  somewhat  paradoxical  in  that  the  appropriate  expertise  must  be 
employed  to  develop  a  roadmap,  but  the  appropriate  expertise 
becomes  fully  Icnown  only  after  a  complete  roadmap  has  been 
constructed.  An  iterative  roadmap  development  process  is 
therefore  essential.  For  an  organization  in  which  many  of  the 
roadmap  components  are  being  pursued  in-house,  such  as  a  large 
focused  government  or  corporate  ledioratory,  much  of  the  expertise 
can  be  assembled  in-house.  Researchers,  developers,  marketers 
and  others  with  relevant  knowledge  of  the  overall  roadmap  theme 
can  be  readily  convened  to  develop  the  framework.  At  the  other 
extreme,  organizations  with  little  expertise  in  the  overall 
roadmap  theme,  such  as  venture  capital  groups  or  cash-rich 
organizations  that  wish  to  expand  their  boundaries,  will  require 
external  assistance  to  develop  credible  roadmaps. 

The  utility  of  a  roadmap  increases  as  it  expands  to  include 
potentially  relevant  R&D  performed  in  all  sectors  of  the 
technical  community.  The  experts  constructing  the  roadmap  can 
draw  upon  their  personal  experience  and  contacts  in  identifying 
other  R&D  performed  in  the  community,  and  should  utilize 
computerized  resources  such  as  program  narrative  databases  to 
identify  relevant  external  R&D.  The  quality  and  credibility  of 
the  roadmap  increases  as  more  experts  are  employed  in  its 
construction.  While  it  is  preferable  to  have  at  least  one  expert 
in  each  node  technical  area  (e.g.,  if  ELECTRO-CHEMISTRY  RESEARCH 
is  one  node,  then  at  least  one  expert  in  this  area  should  be  part 
of  the  roadmap  development  team) ,  useful  roadmaps  can  be  . 
constructed  with  fewer  contributors  of  broader  expertise. 

Experience  has  shown  that  major  benefits  accrue  during  the 
iterative  process  when  the  experts  are  convened  to  develop  the 
framework.  The  roadmap  serves  as  an  important  component  of  both 
strategic  planning  and  technological  forecasting  for  the 
organization,  and  forces  the  developers  to  clarify  conceptual 
strategic  targets  in  order  to  represent  them  graphically. 
Awareness  of  all  the  contributors  to  R&D  required  and  R&D 
available  in  other  sectors  of  the  technical  community  is 
increased,  sometimes  dramatically.  In  particular,  critical  path 
research  can  be  identified,  and  support  for  its  accelerated 
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development  can  be  strengthened.  The  main  value  at  this  phase  is 
to  the  developers  themselves;  additional  value  accrues  when  the 
completed  roadmap  is  provided  to  external  users. 

Step  2:  Identifying  Links  Between  Projects  and  Requirements 

Once  the  full  complement  of  nodes  has  been  identified,  the 
next  step  is  to  graphically  and  quantitatively  depict  the 
relationships  among  the  nodes.  One  node  is  represented  as  linked 
to  another  node  when  the  results  emanating  from  the  first  node 
are  assumed  to  have  some  impact  on  the  achievement  of  targets  of 
the  second  node.  This  relationship  is  depicted  graphically  by  a 
line,  or  link,  connecting  the  two  nodes,  and  is  quantified  by 
assigning  a  value  to  the  link  (e.g.,  Kostoff,  1994i) .  It  is 
important  that  node  experts  from  both  ends  of  the  link  (the 
results  generator  node  and  the  results  user  node)  are  involved  in 
assigning  the  link  value.  Finally,  the  inherent  hypertext 
capabilities  of  GMS  allow  more  descriptive  information  about  each 
node  and  node-connecting  link  to  be  accessed  at  the  touch  of  a 
button.  These  hypertext  capabilities  allow  the  rationale  for  the 
selection  of  each  node,  and  selection  of  node  and  link  values,  to 
be  obtained  easily,  and  thereby  provide  deeper  insight  to  the 
potential  obstacles  and  impediments  to  successful  research 
development  and  transition. 

It  is  assumed  that  the  experts  in  the  node  thematic  areas 
are  most  qualified  to  assign  values  to  the  links  entering  and 
exiting  their  particular  nodes  of  expertise.  Experience  has 
shown  that  most  credible  impacts  are  nearest-neighbor  (e.g., 
basic  research  node  outputs  tend  to  impact  applied  research 
nodes;  applied  research  node  outputs  tend  to  impact  early 
development  nodes) .  The  impact  of  research  on  far-neighbor 
nodes,  such  as  advanced  technology  projects,  tends  to  occur  along 
pathways  consisting  of  nearest-neiglibor  steps.  Thus,  the 
developed  network  consists  of  individual  node-link  subnetworks, 
each  of  which  has  been  assigned  node  and  link  values  by 
appropriate  experts. 

Conceptually,  however,  the  developed  network  is  greater  than 
the  sum  of  its  nodes,  just  as  the  living  human  body  is  greater 
than  the  sum  of  its  component  molecules.  The  developed  network 
includes  the  intelligence  or  inherent  logic,  as  quantified  by  the 
link  values,  which  connects  the  nodes  to  each  other  and  to  the 
overall  mission  goals,  just  as  the  living  human  body  includes  the 
intelligence  which  links  the  molecules  to  each  other  and  to  the 
homeostatic  operation  of  the  body.  As  a  result  of  the  expert 
intelligence  applied  to  quantifying  each  node  value  as  well  as 
the  entering  and  exiting  link  values,  there  are  at  least  two  new 
crucial  pieces  of  information  provided  by  the  developed  network: 
1)  The  strength  of  the  relationships  among  the  projects/ 
capabilities/  requirements  and  the  subsequent  identification  of 
high  obstacle  and  low  obstacle  paths;  2)  Identification  of  R&D 
projects  being  conducted  external  to  the  organization,  their 
importance  to  successful  attainment  of  the  organizations  goals. 
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and  their  potential  for  leveraging  by  the  organization.  Even 
when  node  experts  have  not  been  identified  or  cannot  be  obtained, 
valuable  information  about  gaps  in  expertise  availability  has 
been  generated.  The  developed  network  with  its  enhanced 
information  content  now  serves  to  promote  communications  among 
all  the  participants  and  provide  a  stronger  basis  for  credible 
analysis  and  decisionmaking. 

b.  Model  Analysis 

A  variety  of  analyses  can  now  be  performed,  limited  only  by 
the  interests  and  imagination  of  the  analysts.  The  quantified 
network,  which  contains  a  comprehensive  collection  of  nodes,  can 
serve  as  the  foundation  for  detailed  economic  studies,  broad 
systems  studies,  and  parametric  tradeoff  studies.  The  initial 
utilization  of  the  network  should  serve  to  foster  internal 
communications  and  consensus,  in  preparation  for  these  more 
detailed  analyses. 

Obviously,  the  breadth  of  information  obtained  from  the 
different  perspectives  will  be  limited  by  the  contents  of  the 
total  database.  In  an  ideal  world,  all  existing  and  proposed  R&D 
programs  would  be  entered  in  the  overall  database,  and  the  full 
impact  on  technology  and  capabilities  of  existing  and  proposed 
research  programs  would  be  identified.  In  addition,  the  total 
R&D  available  to  address  required  goals  and  capabilities  would  be 
displayed.  Because  of  all  the  potential  node-link  combinations, 
and  the  attendant  enormous  amount  of  data  required  (Kostoff, 
1994i) ,  constructing  this  complete  database  is  not  feasible  at 
present.  However,  the  central  thesis  of  the  present  paper  is 
that  svibsets  of  the  total  database  embedded  in  the  larger 
analytical  process  still  have  substantial  value.  The  existing 
GMS  has  a  total  R&D  database  constructed  from  the  different 
specific  mission  application  perspectives  which  have  been 
performed,  and  increases  in  value  for  an  organization  as  more 
perspectives  are  generated. 

The  value  of  graphical  models  is  that  they  show  R&D  projects 
and  requirements  in  context  rather  than  in  isolation,  they  can 
depict  new  perspectives  rapidly,  and  they  can  serve  as  a  focal 
point  for  enhanced  communications  and  more  detailed  total  systems 
analyses.  Since  the  context  of  graphical  models  is  different  for 
each  perspective  while  still  using  common  elements  (projects, 
capabilities,  requirements) ,  comprehending  a  broad  R&D  program 
and  associated  requirements  is  very  difficult  without  the  ability 
to  sort  out  these  elements  and  how  they  relate  to  one  another. 

Summary  and  Conclusions 

Transferring  technology  to  customers  efficiently  through  a 
succession  of  autonomous  development  groups  requires 
extraordinary  coordination.  There  are  many  opportunities  for 
technology  transfer  to  become  stalled  at  any  point  along  the  way 
by  disparate  priorities  among  many  groups.  Depicting  potential 
science  conversion  in  a  graphical  model  discloses  to  the 
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scientists  and  investors  alike  the  possible  transfer  points  where 
obstacles  may  occur  to  technology  transfer  or  requirements 
specification  [Geisler,  1995]. 

The  benefits  of  graphical  modeling  include:  1)  showing  R&D 
projects  and  requirements  in  context  rather  than  in  isolation, 

2)  multi-attributed  nodes  which  can  portray  different 
research-requirement  pathways  rapidly,  3)  serving  as  a  focal 
point  for  enhanced  communications  and  more  detailed  total  systems 
analyses,  4)  promoting  champion/ investor  interest,  5) 
portraying  R&D  programs  as  being  strategically  planned,  6) 
portraying  leveraging  of  R&D  projects  from  other  organizations, 

7)  identifying  obstacles  to  rapid  and  low-cost  technology 
development. 


EXPERT  NETWORKS 

Research  Impact  Assessment  is,  at  its  essence,  a  diagnostic 
process  with  many  diagnostic  tools.  In  other  fields  of  endeavor, 
such  as  Medicine  and  Machinery  Repair,  expert  systems  are 
increasingly  being  used  as  diagnostic  tools  or  as  support  to 
diagnostic  processes.  Recently,  there  have  been  efforts  to 
develop  expert  system  approaches  combined  with  artificial  neural 
networks  (expert  networks)  for  use  in  R&D  management,  including 
RIA  [Odeyale,  1993;  Odeyale  and  Kostoff,  1994a,  1994b].  These 
efforts  will  be  summarized  in  this  section.  Much  of  the 
remainder  of  this  section  was  contributed  by  Dr.  Charles  Odeyale, 
a  true  visionary  in  the  application  of  Expert  Networks  to  the 
broad  area  of  R&D  management. 

Overview 

To  increase  the  degree  to  which  rationality  is  used  to  guide 
decisions,  the  authors'  efforts  have  been  directed  towards  a 
comprehensive  R&D  management  tool,  a  high-tech  Peer  Review, 
through  a  modified  version  of  a  previous  Office  of  Naval  Research 
review  process.  The  product  of  these  efforts  is  Research- 
Management  Expert  Network  (R-MEN)  which  is  characterized  by  two 
complementary  tools:  Organizational/Professional  Development  and 
Expert  Network.  The  latter  technology  is  comprised  of  an  expert 
system  (left  side  brain)  and  an  artificial  neural  network  (right 
side  brain) .  Given  a  set  of  research,  and  research  management 
policies  and  strategies,  R-MEN  learns  concepts  that 
hierarchically  organize  those  policies  and  strategies  and  use 
them  in  classifying/triaging  research  proposals.  A  brief  and 
non-technical  description  of  how  this  knowledge  technology  would 
foster  continuous  "learning",  improve  value  and  efficiency, 
increase  productivity,  and  provide  excellent  performance  measures 
of  activities  is  presented. 

Introduction 

There  is  much  concern  about  improving  the  health  of  basic 
research.  The  increasing  politicization  of  the  support  of 
research  has  awakened  many  organizations  to  the  risks  and 
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realities  of  survival.  There  is  a  growing  sentiment  that  it  is 
no  longer  enough  that  research  just  be  excellent,  or  generate  new 
information;  research  must  contribute  results  aimed  toward 
national  goals.  Research  and  Development  (R&D)  administrators 
and  managers  need  a  powerful  management  tool  to  enable  them  to 
predict,  assess  and  monitor  the  impact (s)  of  research  results  and 
research  management  processes  at  the  project,  program, 
organizational,  and  national  levels. 

As  administrators  and  managers  struggle  to  establish 
policy/strategy  that  balance  cost  issues  with  research  outcomes, 
establishing  systems  to  predict,  assess  and  monitor  the  impact (s) 
of  research  results  and  research  management  processes  should  be 
an  important  consideration.  The  authors  have  discovered  that 
successful  outcomes -management  systems  require  five  basic 
components,  namely,  openness-to-change,  specification  process, 
information/  knowledge  technology,  measurement  instruments,  and 
continuous  learning  and  improvement.  For  greater  processing 
power,  immediate  access  to  information,  and  powerful  applications 
that  monitor,  analyze,  and  manage,  the  authors  have  reported 
[Odeyale,  1993;  Odeyale  and  Kostoff,  1994a,  1994b]  a  technology 
whose  functionalities  surpass  these  requirements.  This  value  and 
efficiency  improvement  technology,  which  is  a  comprehensive 
computer-based  Research  Impact  Assessment  (RIA) ,  is  characterized 
by  two  compound  mutually  complementary  tools:  Organizational/ 
Professional  Development  (0/PD)  and  Expert  Network  (EN) . 

The  framework  of  Research-Management  Expert  Network 
(R-MEN)  was  reported  by  Odeyale  and  Kostoff  in  the  references 
cited  above.  It  consists  of  a  knowledge  base  and  a  data  base. 
Feeding  into  the  knowledge  base  are  four  modules;  a  policy/ 
strategy  impartation  module  and  a  proposal  data  acquisition 
module,  both  of  which  receive  input  from  the  0/PD  process;  and  a 
research  impact  calculation  module  and  a  proposal  review  module. 
The  knowledge  base  then  feeds  into  the  data  base  through  five 
modules:  a  project  selection  module,  resources  allocation  module, 
project  evaluation  and  control  module,  investigator  evaluation 
module,  and  organization  evaluation  module. 

Within  the  framework  of  Research-Management  Expert  Network 
(R-MEN) ,  0/PD  pertains  to  the  relevance,  transferability,  and 
system  alignment  of  the  training  and  development  efforts  of  each 
and  every  individual  in  the  organization.  Most  importantly, 
these  criteria  of  timely  selection,  training  and  development  of 
individuals  are  taken  in  conjunction  with  changes  in 
organizational  environments  and  requirements.  Through  0/PD, 
attitudinal,  behavioral,  procedural,  policy,  and  structural 
barriers  are  uncovered  and  "removed”  to  enable  effective 
performance  at  all  levels.  To  effectively  manage  this  continuous 
"learning",  improve  value  and  efficiency,  increase  productivity, 
and  provide  excellent  performance  measures  of  activities,  an 
in format ion/ knowledge  technology  is  needed.  All  these  needs,  and 
more,  are  met  by  the  EN  which  is  comprised  of  an  expert  system 
(left  side  brain),  and  an  artificial  neural  network  (right  side 
brain) .  This  integration  of  information  processing  techniques 
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avoid  the  limitations  of  each  technique  while  capitalizing  on 
their  unique  benefits.  Expert  Systems,  and  Knowledge-Based 
Systems  in  general  including  artificial  neural  network,  are 
computer  programs  that  deal  with  complex  problems  ordinarily 
solved  by  human  experts  who  are  highly  skilled,  trained,  and 
experienced  in  the  specific  area  of  interest. 

The  conceptual  construct  that  provides  the  framework  for  the 
OP/D-based  research  management  processes  is  described  in  three 
phases  as  shown  in  Table  1. 

Table  1  PARTICIPATIVE  R&D  MANA6EHENT  PROCESS 


PHASE.... 

.  .  .  .  . 

..PROCESS . . 

..MANAGEMENT . . 

....MANAGERIAL 

. 

..LEVEL . . 

....STYLES 

I 

Position. 
Audi t . . . . 

.  .a. 

. .Pre-Vision . 

..Sr.  Executives  (with..... 
. .R-MEN)/Sr.  Scientists 

- Authoritative 

.b.. 

. .Strategic . 

. .Vision . 

..Sr.  Executives  (with..... 
. .R-MEN)/Sr.  Scientists 

....Democratic 

.c. . 

..Design  & . 

..Sr.  Executives  (with..... 

....Democratic/ 

11 

R&D  . 

•  •  •  • 

. .Planning . 

. .R-HEN)/Sr.  Scientist..... 

....Authoritative 

.d.. 

..Introduction.... 

..R&D  Director . . 

Process 

•  e. . 

..Implementation.. 

..Sr.  Scientists/Bench..... 
..Level  Investigators 

III 

Control.. 

.f.. 

..Evaluation  &..., 

..Sr.  Executives  (with . 

•  •  ■  ■ 

.  .Control . 

. .R-HEN)  Sr.  Scientists.... 

The  above  steps  and  components  are  identified  to  facilitate  the 
development  of  accurate  activity  standards  to  be  used  in  the 
tracking,  evaluation  and  control  to  foster  accountability  and 
productive  efficiency.  The  general  outline  of  the  processes  is  in 
spirit  with  the  reports  of  Dvibnicki  and  Williams  [1991],  Englert 
[1991],  and  Kostoff  [1992a].  The  phases  are  briefly  described 
below,  see  Odeyale  [1993]  for  detail. 

PHASE  I 

This  phase  includes  the  development  of  the  strategic  plan, 
which  defines  and  communicates  longer-term  research  directions,  and 
the  development  of  the  operating  plan,  which  specifically 
identifies  the  projects  that  will  implement  the  strategic  plan 
taking  into  consideration  the  goals,  quantifiable  objectives  and 
development  of  the  individual  investigator  and  the  organization. 
Series  of  processes  with  interlacing  feed-back-  and  feed-forward- 
loops  in  operation  during  this  phase  include; 

1.  Formation  of  a  top-management  pre-vision  team  composing  of 
theorists,  technologists  and  practitioners  who  must  demonstrate 
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interest  and  coioinitment  to  this  process  and  the  RIA  program  as  a 
whole.  This  team  must  be  able  to  explain  the  "whys”  behind 
directions  or  decisions  in  terms  of  the  employees'  and/or  the 
organization's  interests.  Top  management  must  include  in  their 
considerations:  a)  the  uncertainties  of  innovation  and  the 

environments;  b)  the  recognition  of  technology  push  (the  brilliant 
idea  seeking  a  field/market)  and  field/market  pull  (a  field/market 
need  seeking  a  product) ,  and  what  the  general  corporate  climate  or 
attitude  is  on  projects  based  on  either;  c)  the  determination  of 
attribute,  and  formation  of  attribute  tables  with  the  disciplines 
or  sciences  which  are  determined  to  be  absolutely  necessary  in  the 
support  of  R&D  unique  to  the  organization. 

2.  Transformation  of  research,  and  research  management  policies 
and  strategies  into  key  terms  that  are  used  later  in  proposal  text- 
body  content  analysis.  Policies  and  strategies  may  include  the 
research  direction,  preferred  research  technology,  goals, 
objectives,  values,  etc. 

3 .  Machine  learning  of  the  policies  and  strategies  by  R-MBN  whose 
method  of  learning  is  incremental  concept  formation.  The  policies 
and  strategies  are  grouped  by  research  area  as  they  are  learned. 
They  become  a  form  of  long  term  memory  that  remains  the  same  until 
a  change  in  policy  and  strategy  is  recognized  and  implemented  by 
the  management. 

4.  Collection  of  contract/grant  applications  through  a  Bulletin- 
Board-Service-like  client/server  system.  From  anywhere  in  the 
world  through  a  software  like  "PC  ANYWHERE",  individual 
investigators  can  call  in  to  fill  out  grant  application  electronic 
forms  that  visually  resemble  their  paper  counterparts.  In 
addition,  the  bottom  of  the  forms  and/or  the  last  page  contain (s) 
control  buttons  for  the  collection  of  prediction/assessment  related 
data  which  are  needed  for  network  computing  such  as  benefit, 
contribution,  feasibility,  need,  impact  value,  and  proposal  index 
value  calculations.  This  same  method  is  used  for  the  collection  of 
proposal  review,  and  evaluation/monitoring  related  data  such  as 
solicitation  of  quantifiable  opinions  and  objectives  from  reviewers 
and  individual  investigator,  respectively.  For  example, 
investigator-objectives  are  projected  and  quantified  for  each 
evaluation  period  (one  year)  as  follows; 

a)  No.  of  Poster  Presentations  (0.5  point  each); 

b)  No.  of  Abstract  Publications  (1  point  each) ; 

c)  No.  of  Paper  Publications  (1.5  points  each); 

d)  No.  of  Graduate  Seminar  Lectures  (2  points  for  a  "once-a- 
week-one-semester"  lectures) ; 

e)  No.  of  Developments  (2  points  each) ; 

f .  No.  of  Patent  Applications  (3  points  each) . 

As  an  element  of  vision,  the  top  management  may  envision  or 
set  as  objectives  for  the  whole  (private  or  public)  organization 
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300  publications,  450  published  abstracts,  200  postal  displays  at 
major  scientific  and/or  engineering  society  meetings,  10 
developments,  and  the  assignment  of  at  least  three  patent  rights  in 
a  one  year  period.  All  objectives  must  be  in-line  with  those  of 
the  organization.  After  the  completion  of  the  forms,  with 
appropriate  warnings,  access  to  application  forms  are  denied  once 
the  "SEND”  button  is  pressed. 

5.  The  applications  are  grouped  by  research  area  as  they  are 
collected.  At  the  end  of  funding  agency  published  collection 
period,  coded  policies  and  strategies  are  used  in  proposal  text- 
body  content  analysis  of  each  proposal.  That  is,  R-MEN  will  search 
the  text -body  of  each  application  for  the  coded  key  terms,  counting 
and  adding  only  one  instance  of  each  key  term.  A  major  concern 
about  the  use  of  this  technique  is  that  investigators  who  know  the 
key  terms  may  write  their  proposals  directly  to  address  the  key 
terms.  Ideally,  that  is  what  the  administration  should  require, 
i.e.,  the  alignment  of  the  investigators'  goals  and  objectives  with 
those  of  the  organization.  Besides,  the  investigators  must  meet 
their  projected  quantified  objectives  if  they  want  their  projects 
funded  the' next  time  around.*  This  is  outcomes-management,  placing 
greater  reliance  i oh  standards  and  guidelines.  Furthermore,  such 
resourceful,, proposal  writing  will  be  revealed  during  feasibility, 
need,  and  benefit  calculations  as  described  below.  Anyway,  the 
result  of  this  content  analysis  changes  (triage)  the  state  of  the 
application  to  either  exclusion  or  inclusion  in  further  review 
process. 

,  1  i  .i  I  ,  ■ 

6.  For  R&D_Area-rScierice  Relationships  (feasibility),  Science- 
Requirement  Relationships  (need) ,  and  Requirement -Value 
Relationships  (benefit) >  a  portion  of  R-MEN' s  inference  technique 
uses  a  modified  version^ of  the  Multiattribute  Utility  Technology 
(MAUT)  in  electronically  obtaining  the  views  of  experts  (from 
universities,  government  and  industries) ,  respectively,  on;  a)  the 
potential  impact  of  break-throughs  in  a  research  area  on 
disciplines,  and  specific  research  subject;  b)  the  contribution  of 
the  Science  to  satisfying  operational  requirements  through 
suggested  research  opportunities  (proposals) ;  and  c)  the  magnitude 
of  the  contribution  of  a  set  of  proposals  to  satisfy  a  set  of 
needs.  Refer  to  Edwards  [1980,  1982]  for  detail  on  MAUT.  When  a 
reviewer  calls  in  to  contribute  his/her  opinion  to  the  opinion 
table,  he/she  will  be  asked  to:  i)  review  provided  list  of  value 
disciplines  and  areas  of  interest  in  the  terms  of  their  being 
affected  by  any  research  break-through  in  one  of  the  areas  of 
interest  (say  blood  substitutes) ;  ii)  rank  order  the  value 
disciplines  and  provided  areas  of  interest  to  reflect  their  being 
affected  by  research  break-through  in  blood  substitutes;  and  iii) 
weigh  the  value  disciplines  -  assign  10  points  to  the  least 
affected  disciplines,  then  accordingly  assign  the  relative  impact 
of  blood  substitutes  research  break-through  on  each  discipline, 

(the  limit  is  100  and  as  many  as  100,  500,  etc.  experts  can 
"review"  a  proposal). 
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7.  Before  final  proposal  review  and  indexing,  a  mean  for 
hypothesis  testing  is  provided.  This  nonprimitive  function 
provides  relationship  Congruency  or  Entropy  values  ranging  between 
zero  and  a  system  determined  value,  depending  on  the  data  provided. 
It  provides  a  choice  of  99,  95,  90,  75  or  50%  confidence  level  for 
the  calculation  of  the  entropy  value.  A  value  of  zero  means  that 
the  newly  generated  information/knowledge  from  MAUT  obtained  data 
adds  relatively  no  useable  information/ knowledge  to  the  existing 
one.  A  break-through  research  in  a  project  may  insignificantly 
contribute  to  a  limited  number  of  disciplines,  i.e.,  there  is  no 
cross-fertilization.  Replacing  the  entry  in  the  cell  of  interest 
with  a  new  value  and  repeating  the  calculation  will  generate  a  new 
value  which  may  or  may  not  be  acceptable.  Thus,  it  assists  in  the 
identification  of  special  problems  to  be  addressed  before  project 
selection.  On  the  other  hand,  a  value  other  than  zero  indicates  a 
level  of  added  useable  information/  knowledge  to  the  existing  one. 

A  break-through  research  in  a  project  may  significantly  contribute 
to  a  number  of  disciplines,  i.e.,  there  is  cross-fertilization. 

8.  Impact  and  index  values  are  calculated  for  each  of  the 
applications  using  data  including  investigator's  performance 
record,  stated  objectives,  and  desired  outcomes.  Every  application 
whose  "CRITERIA  MATCH"  field  is  occupied  is  included  in  the 
organization's  R&D  portfolio  and  automatically  indexed  based  impact 
and  index  values.  If  they  have  not  already  been  entered,  the 
system  will  ask  for  available  resources  and  minimum  reserve,  then, 
it  will  start  assigning  fund  to  projects  starting  from  the  one  with 
the  highest  index  value  until  the  minimum  reserve  is  reached. 

PHASE  II 

This  phase  represents  the  necessary  education,  and  management 
support  needed  to  prepare  the  staff  to  participate  in  such  an  • 
"Action  Research"  effort.  This  phase  identifies  and  utilizes  the 
critical  components  required  to  develop  an  environment  that 
facilitates  participative  research  management  activities.  A 
significant  activity  occurring  during  this  phase  is  daily 
verification  of  individual  scheduled  training  and  development.  If 
an  individual  has  no  recorded  training  and/or  development  within  a 
preset  period,  the  system  will  generate  and  send  a  report  through 
E-mail  directly  to  the  office  of  the  director  for  R&D.  The  system 
will  be  able  to  look  at  a  training  and/or  development 
description (s)  and  compare  it/them  with  the  background  of  the 
individual  to  determine  if  the  training  and/or  development  is/are 
suitable  for  that  individual.  This  is  one  of  the  ways  how  R-MEN 
shows  concern  for  hximan  feelings  and  human  needs  for  support, 
dignity,  and  fulfillment  in  work. 

PHASE  III 

This  phase  represents  a  means  by  which  participative  methods 
can  be  put  into  operation  in  developing  productivity  tracking 
systems.  Significant  activities  occurring  during  this  phase 
include  project  evaluation  and  control.  This  entails  periodic 
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monitoring  of  project  milestones  for  applied  research,  and  research 
objectives  for  the  more  basic  research,  if  a  project  has  no 
recorded  fulfillment  of  a  milestone  within  a  preset  period,  the 
system  will  generate  and  send  a  report  through  E-mail  directly  to 
the  office  of  the  director  for  R&D. 

ANTICIPATED  BENEFITS 

Frequently  in  human  affairs,  past  intellectual  baggage  hinders 
our  ability  to  forge  novel  approaches.  Therefore,  we  advocate  the 
use  of  R-MEN  concurrently  with  present  research  review  process. 
During  this  period,  R-MEN  is  foreseen  as  a  supplement  in  the  form 
of  a  guide  to  data  generation,  acquisition  and  processing,  and  a 
validity  check.  Before  long,  just  as  the  R-MEN' s  anticipated 
review  period  is  very  significantly  (62.5  -  66.67%)  less  than  that 
required  by  un-aided  review,  other  R-MEN  benefits,  including  those 
presented  below,  will  standout  as  well. 

With  appropriate  implementation  and  maintenance,  this 
knowledge  technology,  which  utilizes  demonstrated  and  proven 
approaches,  methods,  procedures  and  techniques  in  an  innovative  and 
unique  way,  would: 

1.  Provide  a  means  for  effective,  policy-  and  strategy- 
oriented  management  through  outcomes -management . 

2.  Improve  management  quality,  reduce  operation  costs,  and 
increase  productivity  and  public  trust. 

3 .  Foster  impact  evaluation  to  document  Federally  funded 
program  and  management  effectiveness. 

4.  Provide  short-term  (three-year)  program  progress  tracking 
and  long-term  (ten-year)  result (s)  impact  tracking. 

5.  Shield  administrators,  managers,  and  other  policy-makers 
from  the  complexity  of  the  mathematics  of  the  inference  machine. 

6.  Permit  the  evaluation  of  a  range  of  alternatives. 

7.  Permit  handling  large  amounts  of  data. 

8.  Permit  policy-makers  to  have  a  better  understanding  of 
existing  technical  attributes  of  and  capabilities  for  potential 
projects. 

9.  Facilitate  choice  of  strategy  compatible  with  agency 
structure  and  processes,  and  with  the  policy  or  the  nature  of 
decision  making  for  activities  scheduling  and  control. 

According  to  Nonaka  [1991],  "In  an  economy  where  the  only 
certainty  is  uncertainty,  the  one  sure  source  of  lasting 
competitive  advantage  is  knowledge.  And  yet  . . .  few  managers  grasp 
the  nature  of  the  knowledge-creating  company  -  let  alone  how  to 
manage  it.  The  reason:  They  misunderstand  what  knowledge  is  and 
what  companies  must  do  to  exploit  it." 

Is  the  reader  up  to  date  in  strategic  information/knowledge 
technology  application?  Is  his  strategy-structure  and/or  reward 
and  training  systems  barriers  or  opportunities  to  professional  and 
organizational  success?  Does  the  reader  know  how  to  integrate 
information  technology  with  your  research  management  processes? 
These  are  where  the  authors'  R-MEN  technology  comes  in. 
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QUANTITATIVE  METHODS  -  SUMMARY  AND  CONCLUSIONS 

To  summarize  the  quantitative  methods  section,  few  Federal 
agencies  report  use  of  bibliometrics  to  evaluate  programs  and 
influence  research  planning  in  the  published  literature.  Cost- 
jjenefit  and  other  economic  approaches  have  been  reported  in  the 
published  literature  over  the  years.  The  foundation  on  which  these 
approaches  rest  needs  to  be  strengthened  to  improve  their 
credibility.  As  Averch  [1991]  states,  after  describing  the  huge 
social  rates  of  return  to  investments  in  hybrid  corn  reported  by 
Griliches  [1958]:  "In  general,  economists  compute  high  social 
rates-of-return  to  most  kinds  of  research.  The  rates,  in  fact,  are 
usually  much  higher  than  those  computed  for  other  kinds  of  pviblic 
investment.  So  there  is  a  puzzle  as  to  why  research  investments  do 
not  increase  until  their  marginal  return  just  equals  returns  from 
other  public  investments.” 

IV-C-1.  APPLICATION  OF  METRICS  TO  RESEARCH  UNDER  GPRA 

The  federal  government  is  the  largest  single  sponsor  of 
fundamental  science  research  today.  Increased  scrutiny  of  federal 
programs  in  the  drive  toward  deficit  reduction  requires  increased 
public  accountability  for  the  stewards  of  the  government's  research 
funds.  The  Government  Performance  and  Results  Act  (GPRA)  of  1993 
[Public  Law  103-62]  was  passed  to  improve  the  accountability  of 
government  funded  programs  by  measurements  of  performance  against 
planned  targets.  Federal  agencies  are  required  to  initiate 
implementation  of  GPRA  in  Fyi997;  pilot  projects  [Brown,  1996]  will 
Kelp  identify  performance  measures  for  different  types  of  programs. 
However,  it  is  extremely  important  that  the  tools  used  to  enforce 
research  accountability  do  not  destroy  basic  research. 

There  are  three  major  components  to  GPRA:  Strategic  plans, 
annual  performance  plans,  and  metrics  to  show  how  well  the  annual 
plans  are  being  met.  Classical  strategic  planning  derives  from  the 
military  and  commercial  world,  focuses  on  the  application  of 
knowledge  toward  a  pre-defined  goal  rather  than  the  search  for 
knowledge,  and  assumes  that  the  links  between  plans  and  targets  are 
understood . 

Annual  performance  plans  are  derived  from  production  and 
service  industries,  where  efficiency  in  the  use  of  known  resources 
to  achieve  well  defined  targets  over  the  performance  period  is  the 
main  goal.  Revolutionary  basic  research,  which  has  yielded  some  of 
the  largest  downstream  payoffs  historically,  has  an  inherently 
large  uncertainty  and  failure  rate,  and  may  take  many  years  before 
results  are  forthcoming.  This  intrinsic  long-time  scale 
characteristic  of  basic  research  conflicts  with  the  short-term 
emphasis  of  much  of  the  corporate  world,  where  annual  reports  and 
requirements  for  quarterly  financial  performance  shorten  the 
production  period  for  research  results.  This  near-term  focus  on 
financial  performance  has  essentially  eliminated  long-range  high- 
risk  fundamental  research  financed  from  corporate  funds  in  most 
industries . 
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Metrics  that  gauge  adherence  to  annual  performance  plans 
derive,  in  modern  times,  from  the  time  and  motion  study  component 
of  industrial  engineering.  Again,  these  tools  measure  efficiency 
of  the  use  of  known  resources  to  achieve  specific  goals  over  a  set 
time  period.  At  present,  such  output  metrics  are  applied 
informally  to  research  for  purposes  of  academic  analysis  (3),  and 
these  analytical  results  may  provide  useful  insights  to  research 
activity.  Annual  application  of  these  quantitative  indicators  is 
more  appropriate  for  measuring  the  short-term  observable  outputs 
that  characterize  activity  and  productivity  (cars  produced,  papers 
published)  than  the  long-term  outcomes  that  characterize  mission 
and  societal  impact  (improving  health,  enhancing  safety) .  A 

major  concern  of  researchers  is  that  the  short-term  services  and 
production  orientation  of  the  GPRA  planning  and  metrics  components 
could  re-focus  the  research  away  from  long-range  high-risk 
revolutionary  science  challenges  to  shorter-term  low-risk 
evolutionary  product-oriented  goals.  Annual  application  of  these 
metrics  to  basic  research  in  the  formal  bureaucratic  sense  of  GPRA 
could  convert  the  nature  of  the  research  being  conducted  from  a 
quest  for  knowledge  and  understanding  to  a  drive  for  output 
metrics.  Uncertainties  inherent  in  basic  research  bring  into 
question  the  validity  and  credibility  of  any  long  range  plans  to 
achieve  specific  goals,  since  long-term  research  effectiveness  and 
impact  will  depend  on  economic,  environmental,  and  geopolitical 
factors  not  evident  during  the  research  phase. 

A  more  subtle  concern  is  that  application  of  the  present  GPRA 
approach  to  basic  research  may  effectively  yield  the  same  results 
as  government  imposed  censorship.  The  requirements  of  federal 
agencies  to  display  compliance  with  the  GPRA  metrics  may  reorient 
their  selection  of  research  proposals  to  maximize  these  arbitrary 
measures.  Concepts  that  could  improve  understanding  and  the 
unification  of  science,  but  would  not  optimally  satisfy  the  GPRA 
metrics,  might  no  longer  be  proposed  for  federal  funding  because  of 
lower  funding  probability.  (The  author  is  reminded  of 
Solzhenetzyn' s  views  that  the  worst  part  of  documents  being 
censored  was  not  that  sections  were  rejected;  the  worst  part  was 
the  loss  of  those  ideas  which  were  not  even  expressed  and 
eventually  no  longer  considered  because  of  the  knowledge  that  they 
would  be  censored) .  Safe,  short-term,  low-risk  evolutionary 
research  would  become  the  accepted  practice.  Basic  research  needs 
to  be  decoupled  from  'strategic'  targets  and  GPRA  metrics,  and  the 
scientific  roadblocks  and  challenges  alone  should  be  the  stimuli 
for  research  activity. 

A  more  appropriate  accountability  approach  for  basic  research 
is:  i)  articulation  of  a  rational  investment  strategy;  ii)  long  and 
short-term  retrospective  studies  that  show  the  diverse  benefits 
from  past  research  and  potential  future  benefits;  iii)  quality 
control  of  expert  peer  review.  An  organization's  research 
investment  strategy  is  a  rationale  for  the  prioritization  and 
allocation  of  resources  to  address  knowledge  deficiencies  which 
impede  attainment  of  the  organization's  goals.  Short-term 
retrospective  studies  show  how  recent  research  has  affected  fields 
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of  science,  and  may  contain  projections  of  future  impacts  of 
research  on  technologies,  systems,  and  operations.  Long-term 
retrospective  studies  of  major  innovations  and  outcomes  in  systems 
and  technology  show  the  origins  of  critical  research  and 
development  advances  in  a  broad  spectrxim  of  fundamental  research 
performed  many  decades  earlier.  Expert  peer  review  on  a  periodic 
basis  will  validate  the  soundness  of  the  investment  strategy  and 
the  importance  of  the  research  accomplishments  and  subsequent 
technology  impacts. 

Peer  review  properly  designed  to  support  GPRA  would  provide 
credible  indication  to  the  research  sponsors  of  intrinsic  program 
quality,  program  relevance,  management  quality,  and  appropriateness 
of  direction,  and  has  the  potential  to  improve  the  quality  of  the 
research  program  as  well.  Before  such  a  review  process  is 
implemented,  a  number  of  considerations  have  to  be  addressed,  and 
they  have  been  described  in  detail  in  the  previous  section  on  peer 
review. 

In  summary,  peer  review  is  the  appropriate  central  evaluation 
mechanism  for  basic  research  under  GPRA,  but  careful  thought  and 
planning  will  be  required  to  implement  a  viable  and  credible  peer 
review  process. 

IV-C-2.  CITATION  ANALYSIS  CROSS-FIELD  NORMALIZATION:  A  NEW 
PARADIGM 

CROSS-FIELD  CITATION  NORMALIZATION;  THE  ISSUES 

Science,  Nature,  Physics  Today,  Scientometrics,  and  other 
leading  science  and  science  evaluation  journals  continually  publish 
articles  comparing  and  ranking  technical  disciplines,  departments, 
institutions,  countries,  and  people  on  the  basis  of  literature 
citations.  Because  of  differences  in  numbers  of  researchers  in 
different  fields  and  in  citing  cultures,  normalizations  of  absolute 
citation  numbers  to  some  reference  are  required  to  assign  meaning 
to  any  comparisons.  As  shown  in  a  recent  review  of  cross-field 
citation  normalization  techniques,  all  present  methods  normalize 
citations  of  a  given  paper  to  citations  of  similar  theme  papers 
[Schubert,  1993]].  The  two  main  differences  among  these  methods 
are  how  the  similar  theme  papers  are  defined  (e.g.,  papers 
published  in  same  journal  issue,  papers  sharing  a  threshold  number 
of  common  references,  etc.),  and  what  types  of  mathematical/ 
statistical  approaches  are  used  to  normalize  the  position  of  a 
target  paper  relative  to  that  of  its  competitors.  This  limited 
comparative  approach  allows  relative  comparisons  among  similar 
papers,  but  ignores  two  crucial  points.  Purely  relative  comparison 
with  other  similar  papers  does  not  allow  very  credible  comparisons 
among  different  disciplines  based  on  citation  analysis,  and  does 
notprovide  an  indication  of  citation  efficiency. 

To  gain  wider  acceptance  and  credibility,  citation  analysis 
needs  to  overcome  these  two  limitations,  and  offer  the  broader 
perspective  of  how  frequently  a  paper  was  cited  compared  to  how 
frequently  it  could  have  been  cited.  The  following  sections 


156 


describe  a  citation  normalization  method  that  would  overcome  the 
above  two  limitations,  and  provide  the  added  dimension  offered  by 
the  broader  perspective. 

CROSS-FIELD  CITATION  NORMALIZATION:  A  NEW  PARADIGM 

The  fundamental  concept  of  the  new  paradigm  was  derived  from 
the  thermodynamic  principle  of  Carnot  efficiency.  The 
thermodynamic  analog  will  be  described  through  an  illustrative 
example,  and  the  metamorphisis  to  citation  efficiency  will  then  be 
shown. 

Assume  that  two  classes  of  engines  are  being  evaluated.  One 
class  of  engines  (hereafter  called  fusion  engines)  has  been 
developed  to  convert  energy  being  produced  in  very  high  temperature 
fusion  reactors,  and  the  other  class  (hereafter  called  ocean 
engines)  has  been  developed  to  convert  energy  from  the  temperature 
differentials  in  the  deep  ocean.  Assvime  that  there  are  three 
different  fusion  engines  being  evaluated  in  the  fusion  class,  and 
the  demonstrated  conversion  efficiencies  of  these  engines  are  1,  2, 
and  3  percent,  respectively.  Assume  that  there  are  three  different 
ocean  engines  being  evaluated  in  the  ocean  class,  and  the 
demonstrated  conversion  efficiencies  of  these  engines  are  also  1, 

2,  and  3  percent,  respectively. 

If  it  were  desired  to  evaluate  the  performance  quality  of  all 
six  engines,  with  efficiency  being  the  metric  of  quality,  one 
simplistic  approach  would  be  to  rank  all  six  engines  by 
demonstrated  efficiency.  The  fusion  engines  would,  on  average, 
have  equivalent  quality  to  the  ocean  engines  by  this  approach. 
However,  a  far  better  indicator  of  performance  quality  would  be  the 
ratio  of  each  engine's  demonstrated  efficiency  to  the  maximum 
efficiency  the  engine  could  achieve  in  its  operating  environment. 

From  thermodynamics,  this  maximvim  theoretical  efficiency  that 
each  engine  could  achieve  is  the  Carnot  efficiency,  which  is  a 
function  of  the  high  temperature  and  low  temperature  extremes  in 
which  the  engine  operates.  For  very  high  maximum  temperatures  and 
near-ambient  low  temperatures  (characteristic  of  fusion) ,  the 
Carnot  efficiency  approaches  unity,  and  for  low  maximvim 
temperatures  and  ambient  low  temperatures  (characteristic  of 
ocean) ,  the  Carnot  efficiency  approaches  zero.  If  the  comparison 
figure  of  merit  becomes  the  ratio  of  demonstrated  efficiency  to 
Carnot  efficiency,  then  the  ocean  engines  in  this  case  wpuld 
outperform  the  fusion  engines  by  a  wide  margin,  since  the  ocean 
engines  are  operating  closer  to  their  theoretical  maximum  than  are 
the  fusion  engines.  Even  where  the  engine  evaluation  is  limited  to 
one  field  (e.g.,  fusion),  viewing  relative  performance  from  the  new 
efficiency  ratio  perspective  provides  an  added  dimension  for 
understanding  performance,  while  the  relative  engine  rankings 
within  fusion  remain  unchanged. 

Now  the  crossover  from  thermodynamic  efficiencies  to  citation 
efficiencies  will  be  made,  with  use  of  analogs  to  the  above 
example.  For  fusion,  convert  each  engine  into  a  research  paper  of 
similar  theme,  and  convert  each  engine  efficiency  into  citations 
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received  by  the  research  paper  over  some  unit  of  time.  Thus,  there 
are  now  three  fusion  research  papers  of  similar  theme  being 
compared  which  have  1,  2,  and  3  citations  over  some  unit  of  time, 
respectively.  Similarly,  for  ocean,  there  are  now  three  ocean 
papers  of  similar  theme  being  compared  which  have  1,  2,  and  3 
citations  over  the  same  unit  of  time,  respectively. 

Generically,  the  existing  orthodox  approach  to  cross-field 
citation  normalization  might  divide  the  number  of  fusion  citations 
by  the  domain  average  (2.0)  and  provide  each  fusion  paper  a 
normalized  value  and  ranking  in  its  class.  Thus,  the  paper  with  3 
citations  might  have  a  normalized  value  of  1.5  (3/  2),  and  an  upper 
33  percentile  ranking.  Using  similar  normalization  for  the  ocean 
papers  and  dividing  citations  by  2.0  (the  domain  average),  the 
paper  with  3  citations  might  have  a  normalized  value  of  1.5  (3/  2), 
and  an  upper  33  percentile  rating.  The  existing  orthodox  approach 
would  consider  the  leading  paper  in  each  class  as  the  same  quality 
because  of  identical  ranking  in  its  class  (upper  33  percentile) . 
However,  as  in  the  Carnot  cycle  analogy,  a  better  figure  of 
merit  for  quality  would  be  the  ratio  of  actual  number  of  citations 
received  by  a  paper  to  the  theoretical  maximum  number  of  citations 
that  could  be  received  by  the  paper,  a  quantity  which  will  be 
termed  the  citation  efficiency.  Then,  different  papers  in  the  same 
field,  as  well  as  papers  in  different  fields,  could  be  compared  on 
the  basis  of  citation  efficiency.  The  citation  efficiency  becomes 
the  cross-field  normalizer,  and  indicates  how  well  a  paper 
performed  from  a  citation  perspective  compared  to  how  well  it  could 
have  performed.  It  is  an  intrinsic  measure  of  accomplishment. 

DETERMINATION  OF  CITATION  EFFICIENCY 

There  are  two  crucial  steps  involved  in  determining  the 
citation  efficiency,  and  they  are  not  completely  independent.  To 
compare  a  target  paper  to  other  papers,  the  first  step  is  the 
selection  of  the  universe  of  papers  to  be  compared  and  the  second 
step  is  the  determination  of  the  maximum  number  of  citing  papers  to 
be  used  in  the  computation  of  efficiency.  For  present  purposes, 
assume  that  a  universe  of  papers  to  be  compared  to  the  target  paper 
has  been  selected  using  existing  techniques.  Again,  for  present 
purposes,  assume  that  this  universe  consists  of  sub-universes  of 
papers  with  similar  themes.  Thus,  the  universe  of  fusion  and  ocean 
papers  consists  of  a  fusion  sub-universe  with  similar  themes  and  an 
ocean  sxib-iiniverse  with  similar  themes. 

Next  comes  the  determination  of  the  maximum  number  of 
potential  citing  papers.  The  following  theme-centered  approach  is 
proposed  for  computing  maximum  potential  citations.  For  the  fusion 
papers  within  the  similar  theme  sub-universe,  the  maximum  number  of 
times  one  of  the  fusion  papers  could  have  been  cited  (in  the  given 
unit  of  time)  is  assumed  to  be  equal  to  the  number  of  different 
citing  papers  in  which  any  of  the  papers  in  the  fusion  sub-universe 
were  cited.  Any  of  these  citing  papers  could  have  cited  0,  1,  or 
all  of  the  similar  theme  fusion  sub-universe  papers.  The  same 
procedure  for  determining  the  maximum  applies  to  the  ocean  papers. 


158 


but  the  fusion  maximum  will  probably  be  quite  different  from  the 
ocean  maximum.  Then  the  citation  efficiency  of  each  paper  in  the 
selected  universe  can  be  computed,  and  the  papers  compared  by  this 
figure  of  merit.  The  actual  number  of  citations  of  each  fusion 
paper  would  be  divided  by  the  fusion  paper  maximum  (this  maximum  is 
the  same  for  all  the  fusion  sub-universe  papers)  to  arrive  at  the 
efficiency,  and  the  actual  number  of  citations  of  each  ocean  paper 
^ould  be  divided  by  the  ocean  paper  maximum  (this  maximvim  is  the 
fame  for  all  ocean  sub-universe  papers)  to  arrive  at  the 
Efficiency. 

^  The  following  figures  illustrate  how  such  an  efficiency 
computation  would  be  performed.  Figure  1  is  a  matrix  showing  how 
many  times  each  citing  paper  (A,  B,  C)  cites  each  cited  paper  (G, 

H,  I)  for  the  ocean  case. 

FIGURE  1  -  CITING  PAPER  VS  CITED  PAPER  MATRIX:  OCEAN 

. CITING  PAPER 

. A. .B. .C 

. G. . .X. .X. .X 

CITED . H...X..X 

PAPER . I.  .  .X. 

The  x(s)  in  the  matrix  represent  a  citation.  Thus, 
citing  paper  A  cites  papers  G,  H,  and  I,  while  citing  paper  C  cites 
only  paper  G.  The  maximum  number  of  potential  citations  for  papers 
G,  H,  or  I  is  3,  because  there  are  three  citing  papers.  The 
citation  efficiency  of  G  is  1  (3/  3);  the  efficiency  of  H  is  .67 
(2/  3);  and  the  efficiency  of  I  is  .33  (1/  3). 

Figure  2  is  the  same  type  of  matrix  for  the  fusion  papers. 

The  citing  pattern  has  been  changed. 

FIGURE  2  -  CITING  PAPER  VS  CITED  PAPER  MATRIX:  FUSION 

. CITING  PAPER 


.A' .B' .C .D' .E' .F' 


. G' . . .X. .X. .X. . 

CITED . H' . X..X 

PAPER . I' . X 

Now,  each  citing  paper  (A' — >F')  cites  only  one  of  the  fusion 
papers  (G'-I').  The  maximum  number  of  potential  citations  for 
papers  G',  H',  or  I'  is  6,  because  now  there  are  six  citing  papers. 
The  citation  efficiency  of  G'  is  .5  (3/  6);  the  efficiency  of  H'  is 
.33  (2/  6) 7  the  efficiency  of  I'  is  .17  (1/  6). 

Under  the  present  normalization  system,  paper  G  would  have 
been  rated  as  the  same  quality  as  paper  G',  since  each  ranked  first 
in  its  own  thematic  sub-universe,  and  paper  I  would  have  been  rated 
as  the  same  quality  as  paper  I',  since  each  ranked  last  in  its  own 
thematic  sub-universe.  Under  the  new  system  proposed  here,  paper  G 
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ranks  above  paper  G ' ,  and  paper  I  ranks  above  paper  I ' .  This  is 
displayed  more  graphically  in  Figure  3,  where  the  citation 
efficiencies  of  the  ocean  papers  are  obviously  higher  -than  their 
fusion  counterparts. 

FIGURE  3  -  CITATION  EFFICIENCY  VS  NUMBER  OF  CITATIONS 
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Aggregate  citation  efficiencies  may  also  be  defined.  Assume 
the  aggregate  citation  efficiency  of  the  group  of  ocean  papers  (G, 
H,  I  from  figure  1)  were  desired.  This  quantity  is  the  ratio  of 
the  number  of  citations  received  by  papers  G,  H,  and  I  (the  number 
of  asterisks  in  figure  1)  to  the  maximum  number  of  times  these 
papers  could  have  been  cited  (the  nuumber  of  matrix  elements  in 
figure  1).  For  the  figure  1  example,  this  aggregate  citation 
efficiency  is  .67  (6/  9),  and  for  figure  2  this  aggregate  citation 
efficiency  is  .33  (6/  18). 

This  example  illustrates  the  added  dimension  provided  by  the 
citation  efficiency  perspective;  the  ability  to  evaluate  and 
interpret  research  paper  utilization  patterns  within  and  across 
different  disciplines.  Is  the  difference  in  aggregate  efficiencies 
due  to  a  different  level  of  awareness  of  ocean  and  fusion  authors 
of  the  intellectual  foundations  of  their  respective  fields,  and/  or 
is  the  difference  due  to  the  different  levels  of  quality  and 
uniqueness  of  the  intellectual  foundation  papers  in  the  different 
fields,  and  therefore  different  citation  desireability  of  these 
papers?  What  other  factors  are  operable  [2,  3]? 

Finally,  the  'quality'  of  different  citing  journals  (or  any 
other  quantified  parameters  associated  with  each  journal)  may  be 
incorporated  in  the  citation  efficiency  by  computing  a  quality- 
weighted  citation  efficiency,  or  a  quality-weighted  aggregate 
citation  efficiency. 


SUMMARY 

A  new  paradigm  for  comparing  quality  of  published  papers 
across  different  disciplines  has  been  proposed.  This  method  uses 
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a  figure  of  merit  of  the  ratio  of  actual  citations  received  to  the 
potential  maximum  number  of  citations  that  could  have  been 
received.  It  is  analogous  to  approaches  used  to  compare 
performance  in  physical  systems,  and  appears  intrinsically  more 
useful  than  present  approaches. 

IV-C-3.  THE  PIED  PIPER  EFFECT;  A  SPECIFIC  EXAMPLE 

An  article  in  Science  magazine  purports  to  identify  the  Top  10 
U.S.  Universities  in  Clinical  Medical  Research  from  1990-1994 
[SCIENCE,  1995].  The  published  papers  and  citations  per  paper  are 
ranked  in  decreasing  frequency  by  medical  research  institution,  and 
the  institutions  with  the  highest  frequencies  of  publications  and 
citations  are  identified  as  the  top  universities  in  clinical 
medicine  research.  This  Science  article  crystallizes  the  problem 
of  using  metrics  as  a  gauge  of  research  productivity  and,  by 
inference,  quality.  This  statement  will  be  amplified  with  an 
illustrative  example  which  questions  the  linkage  between  high 
research  output  and  high  research  quality.  The  example  focuses  on 
cataracts,  but  is  extrapolateable  to  other  chronic  systemic 
problems  as  well. 

The  author  recently  did  a  literature  survey  of  research  papers 
related  to  cataracts.  The  author  examined  four  years  (1991-1994) 
of  abstracts  from  the  Science  Citation  Index  (SCI)  and  the  Social 
Science  Citation  Index  (SSCI) .  Of  the  many  hundreds  of  abstracts 
identified,  perhaps  99%  dealt  with  different  aspects  of  the 
surgical  treatment  of  cataracts.  Maybe  1%  or  less  dealt  with 
nutritional  approaches,  and  these  were  mainly  vitamin  and  mineral 
supplementation  for  prevention.  There  were  no  papers  in  these 
peer-reviewed  journals  dealing  with  alternative  approaches  to 
cataract  treatment . 

The  mainstream  medical  community  views  cataracts  strictly  as 
an  eye  problem.  The  lens  degenerates  for  unknown  reasons,  in  their 
view,  and  when  it  has  deteriorated  sufficiently,  it  should  be 
replaced  surgically. 

An  alternative  paradigm  is  that  the  body  experiences  chronic 
systemic  problems  (deficiencies  of  various  types) ,  and  these 
problems  manifest  themselves  as  symptoms  in  specific  organs.  For 
some  people,  the  weak  organ  is  the  eye,  and  the  symptom  is  the 
cataract.  Healing,  in  this  paradigm,  consists  of  identifying  and 
eliminating  the  deficiencies.  Surgically  removing  the  cataract, 
while  improving  functioning  (at  least  temporarily) ,  does  nothing  to 
address  the  fundamental  systemic  problems  which  are  at  the 
foundation  of  the  cataract's  presence.  It  is  equivalent  to 
removing  the  warning  light  on  a  car's  dashboard  when  it  signifies  a 
problem. 

These  alternative  approaches  never  surface  in  the  peer 
reviewed  literature,  as  the  author's  survey  has  shown.  The  journal 
reviewers  (and  the  funding  proposal  reviewers  as  well)  are 
researchers  trained  along  the  orthodox  paradigms,  and  they  provide 
high  marks  to  those  papers  (and  proposals)  aligned  with  the 
reviewers'  backgrounds.  In  addition,  there  are  institutional  and 
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commercial  biases  which  also  govern  the  willingness  of  the 
reviewers  and  editors  (and  sponsors)  to  provide  positive 
evaluations  of  alternative  approaches.  Thus,  the  copious  papers 
and  citations  (and  grants)  from  this  component  of  medical  research 
reflect  activity  among  a  closed  group  whose  members  subscribe  to 
essentially  toe  same  orthodox  paradigm.  Far  from  being  a  measure 
of  quality,  the  numbers  of  papers  and  citations  (and  projects)  from 
some  branches  of  medical  research  could  be  interpreted  as  a  measure 
of  the  extent  of  the  problem. 

Near  the  beginning  of  section  VII  (and  scattered  throughout 
this  Handbook  as  well) ,  the  author  differentiates  between  the  two 
major  characteristics  of  high  quality  science:  doing  the  job  right 
and  doing  the  right  job  (in  the  best  of  all  worlds  the  right  job 
right  would  be  done) .  The  Science  article  is  an  example  of  doing 
the  job  right.  Once  the  research  target  has  been  selected 
(paradigm  of  using  the  surgical  approach  to  eliminating  cataracts) , 
the  orthodox  medical  research  community  performs  an  excellent  and 
highly  productive  effort  in  finding  the  best  ways  to  achieve  the 
target.  It  is  analogous  to  firing  a  missile  very  accurately  at  the 
wrong  target.  However,  one  can  question  seriously  whether  they  are 
doing  the  right  job  (using  the  right  paradigm) ,  and  the  present 
closed  funding,  review,  and  publication  structure  effectively 
precludes  innovations  which  will  address  the  right  job. 

The  Science  article,  and  the  above  comments,  illustrate  the 
danger  of  relying  on  metrics  to  infer  quality  from  scientific 
activity.  Metrics  have  their  place  in  a  comprehensive  evaluation 
procedure  of  research  as  the  previous  section  has  shown,  but  as  a 
stand-alone  approach  as  reflected  in  the  Science  article  metrics 
are  subject  to  misinterpretation. 

V.  RECOMMENDED  AREAS  FOR  RESEARCH  IN  RIA 

V-A .  Semi-Quantitative  Methods 

The  Hindsight,  TRACES,  and  ARPA  studies  provided  valuable 
insight  into  the  parameters  which  affect  the  quality  and 
productivity  of  research.  These  types  of  studies  should  be 
expanded.  More  organizations,  such  as  in  the  ARPA  study,  should  be 
examined  from  a  retrospective  viewpoint.  More  technologies  and 
systems,  as  in  the  TRACES  and  Hindsight  studies,  should  be 
examined.  However,  more  emphasis  should  be  expended  on  identifying 
and  tracing  the  pathways  of  the  indirect  impacts  of  research. 
Especially  for  basic  research,  the  research  products  are  ' 
disseminated  broadly,  impacting  eventually  not  only  the  sponsoring 
organization's  goals,  but  the  broader  societal  goals  as  well. 

These  broader  impacts  should  be  captured  within  the  studies. 

The  latest  technologies,  such  as  information  processing  and 
computer  hardware  and  software,  should  be  employed  in  these 
retrospective  approaches.  As  suggested  previously,  in  the  section 
describing  the  recent  TRACES  study  [Narin,  1989],  citation  and  co¬ 
citation  analysis,  combined  with  co-word  analysis,  could  trace  some 
of  the  indirect  impact  pathways.  Citations  of  successive 
generations  of  papers,  for  example,  could  document  the  diffusion 
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and  dissemination  of  the  products  of  research. 

Alternatively,  network  approaches  could  explore  the 
information  flow  among  research  and  technology  areas.  .  Combined 
with  co-nomination  techniques,  these  approaches  could  not  only  shed 
light  on  information  dissemination,  but  on  the  people  involved  in 
the  diffusion  process  as  well.  Since  these  bibliometric,  network, 
and  co-occurrence  approaches  were  presented  in  the  text  as  examples 
of  quantitative  techniques,  but  are  listed  now  under  proposed  semi- 
quantitative  studies,  they  will  not  be  repeated  later  in  the 
section  on  proposed  quantitative  studies.  In  the  present  context 
of  being  combined  with  the  retrospective  approaches,  they  become 
hybrid  approaches  (quantitative/  semi-quantitative) . 

Central  to  credible  work  in  tracking  the  diffusion  of 
information  from  research  is  a  database  of  research  products  at 
various  evolutionary  stages  which  can  feed  the  models.  Since  the 
research  product  evolutionary  pathways  transcend  the  research 
originating  organization,  and  can  intersect  all  societal  sectors, 
the  cooperation  of  many  public  and  private  organizations  would  be 
required  to  develop  a  database  of  research  products  in  their 
evolutionary  stages.  Development  and  construction  of  such  a 
database  should  start  now. 

V-B.  Peer  Review 

One  of  the  central  problems  in  peer  review  is  lack  of 
credibility  in  its  predictive  reliability.  More  studies  are 
necessary  to  relate  evaluations  by  peers  of  research  proposals  and 
existing  research  programs  to  future  impacts  of  this  research. 
Presently,  the  data  to  validate  different  predictive  models  does 
not  exist.  As  stated  above  and  reiterated  in  section  V-D,  what  is 
required  is  a  database  which  allows  tracking  of  the  evolution  of 
products  of  research  in  their  various  metamorphisized  stages. 

Having  such  a  database  would  allow  not  only  validation  of  peer 
review  predictive  models,  but  bibliometric  predictive  models  and 
other  quantitative  predictive  models  as  well.  The  database  would 
allow  predictive  reliability  to  be  determined  for  a  number  of 
different  types  of  impact.  These  would  include  impact  on  the 
research  area  of  interest,  impact  on  allied  research  areas,  impact 
on  technology,  impact  on  systems,  impact  on  operations,  etc. 

An  excellent  discussion  of  the  validity  and  reliability  of  the 
peer  review  results  can  be  found  in  Cicchetti  [1991],  as  well  as  in 
other  commentary  in  the  journal  issue  in  which  Cicchetti 's  article 
appears.  To  improve  validity  and  reliability,  research  needs  to  be 
done  on  optimal  numbers  of  reviewers  utilized;  ascertaining  whether 
author  anonymity  impacts  the  results;  and  ascertaining  whether 
training  people  to  perform  peer  reviews  would  increase  review 
quality  as  well  as  reliability  and  validity. 

There  are  very  few  comparative  studies  of  different  types  of 
peer  groupings  and  the  quality  of  the  peer  review  product.  Studies 
should  be  done  varying  mail  vs.  panel  review,  British  model  vs. 
standard  model  (peer  review  using  professionals  instead  of  eminent 
persons) ,  panel  size,  types  of  reviewer  expertise,  time  expended  by 
the  reviewers  and  reviewees  on  the  process,  and  correlating  these 
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variables  with  the  quality  of  the  product.  Central  to  the  result 
would  be  how  cost  of  the  review  varies  with  quality  of  the  product 
and  is  affected  by  the  different  variables. 

While  the  present  Handbook  included  a  very  approximate 
estimation  of  total  peer  review  time  and  dollar  costs  for  one  peer 
review  scenario,  more  accurate  time  and  cost  estimates  would  be 
required  when  comparing  different  types  of  peer  review  scenarios. 
Extensive  data  taking  would  be  necessary,  because  of  the  many 
different  types  of  peer  reviews  in  existence.  However,  since  total 
^eer  review  costs  can  be  substantial,  and  since  cost  reduction  with 
ifeonsistent  quality  would  be  one  of  the  goals  of  these  different 
types  of  suggested  studies,  both  the  extensive  data  taking  and 
development  of  improved  peer  review  cost  estimating  procedures 
would  be  well  justified  from  an  economic  viewpoint. 

The  application  of  expert  systems  and  knowledge -based  systems 
for  proposal  evaluation  and  program  review  could  supplement  peer 
review.  Few  studies  have  been  done  along  these  lines,  but  a  recent 
dissertation  [Odeyale,  1993]  and  follow-on  studies  [Odeyale  and 
Kostoff,  1994a,  1994b]  address  this  problem  in  detail  .  Much  more 
work  would  be  required  to  validate  the  application  of  these 
advanced  technologies  as  useful  supplements  to  peer  review,  but 
more  research  in  this  direction  could  determine  whether  there  is 
potential  for  real  payoff. 

One  of  the  potential  benefits  resulting  from  a  peer  review  is 
constructive  feedback  to  the  reviewee(s)  followed  by  an  improvement 
in  the  reviewee's  conduct  of  research.  Studies  should  be  done  to 
ascertain  reviewees'  perceptions  of  the  peer  review  and  the 
review's  value  in  improving  the  conduct  of  research.  A  recent 
study  [Luukkonen,  1993]  addresses  peer  review  from  the  reviewee's 
perspective,  but  much  more  can  be  done  to  improve  the  information 
transfer  from  the  reviewers  to  the  reviewee,  and  to  insure  that  the 
review's  recommendations  were  translated  into  improved  research. 

Finally,  there  are  non-tangible  aspects  of  peer  reviews  on 
which  research  could  provide  valuable  information.  For  example,  in 
many  periodically  scheduled  reviews,  relatively  few  programs 
receive  poor  grades.  This  leads  to  the  critique  that  the  reviews 
are  not  cost-effective;  too  much  time  and  effort  are  being  expended 
for  too  little  return.  The  rationale  supporting  the  reviews  is 
that  the  knowledge  that  the  review  will  occur  maintains  a  high 
threshold  of  research  quality.  Performers  will  be  less  inclined  to 
work  on  their  theses  for  decades  if  they  know  that  they  will  be 
evaluated  periodically.  The  analogy  is  to  a  well-known  speed  trap 
on  a  highway.  The  knowledge  that  a  stretch  of  road  is  well  policed 
is  sufficient  to  keep  the  average  speed  within  the  posted  limit. 

The  fact  that  the  officers  write  relatively  few  tickets  in  this 
area  is  not  a  measure  of  effectiveness  of  the  speed  trap.  Studies 
should  be  done  comparing  the  quality  of  research  of  periodically 
reviewed  programs  to  infrequently  ad  hoc  reviewed  programs  to  see 
if  this  supporting  rationale  is  in  fact  valid. 

On  the  other  side  of  the  spectrum,  would  the  knowledge  of 
periodically  scheduled  reviews  stifle  the  pursuit  and  presentation 
of  very  innovative  but  far-out  ideas.  Would  performers  be 
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reluctant  to  present  these  ideas  in  a  public  forum,  where  the 
credibility  of  the  performers  could  be  challenged  for  these  ideas. 
Research  is  needed  to  ascertain  whether  ideas  have  been  suppressed 
in  periodically  reviewed  programs,  and  to  determine  how  this 
problem  could  be  surmounted. 

V-C.  Quantitative  Methods 

In  the  practical  use  of  bibliometrics,  one  of  the  problems 
which  arises  is  cross-discipline  comparisons  of  outputs.  For 
example,  how  should  the  paper  or  citation  output  of  a  program  in 
Solid-State  Physics  be  compared  to  that  of  Shallow  Water  Acoustics. 
What  types  of  normalizations  are  required  to  allow  comparisons 
among  these  different  types  of  programs  and  fields.  Is  there  a 
threshold  for  disaggregation  below  which  the  normalization  factors 
apply  to  all  the  subfields.  For  example,  can  the  normalization 
factor  for  Acoustics  be  applied  to  a  program  in  High  Frequency 
Shallow  Water  Acoustics,  or  can  the  normalization  factor  for 
Shallow  Water  Acoustics  be  applied  to  the  program  in  High  Frequency 
Shallow  Water  Acoustics? 

Or,  is  credible  normalization  not  possible?  A  recent  survey 
of  important  research  performance  indicators  [Australia,  1993]  was 
described  in  the  bibliometrics  section  of  the  present  Handbook. 

The  survey  results  indicated  that  the  important  performance 
indicators  may  rank  differently  for  different  disciplines.  This 
suggests  that  multiple  indicators  would  be  required  for  any  cross¬ 
field  comparisons.  Under  these  circumstances,  cross-discipline 
comparisons  would  require  not  only  normalizations  for  the  same 
indicators,  but  some  type  of  weighting  correction  to  account  for 
the  different  relative  importance  of  the  indicators  on  different 
disciplines.  More  research  on  these  issues  needs  to  be  done  to 
make  cross-discipline  comparisons  using  bibliometrics  more 
acceptable. 

An  area  of  bibliometrics  which  has  been  gaining  in  popularity 
over  the  past  decade  has  been  that  of  partial/multiple  indicators 
[e.g.,  Martin,  1983;  Rubenstein,  1988,  1991].  In  some 
applications,  different  partial  indicators  are  combined  to  give  an 
overall  figure  of  merit.  A  number  of  research  issues  need  to  be 
addressed  here.  If  the  indicators  do  not  form  an  orthogonal  set, 
there  will  be  multiple  counting,  and  the  results  will  be  skewed. 

As  a  hypothetical  example,  if  it  were  shown  that  publications  were 
strongly  correlated  with  awards,  then  including  publications  and 
awards  in  the  figure  of  merit  would  be  a  double  counting  of 
publications.  There  needs  to  be  research  showing  how  the  different 
leading  indicators  are  related  to  each  other,  whether  the 
relationship  varies  for  different  disciplines,  and  the  degree  to 
which  the  different  indicators  overlap. 

Typically,  the  indicators  are  combined  in  a  linear  manner  to 
arrive  at  the  figure  of  merit.  In  addition  to  the  problem  that  the 
weighting  factors  may  be  field-dependent,  as  discussed  in  the 
section  on  cross-discipline  comparison  above,  the  linear  assvimption 
may  be  invalid  over  the  full  range  of  the  indicators.  For  example, 
marginal  utility  theory  would  suggest  that  while  it  might  be  twice 
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as  valuable  for  a  researcher  to  pxiblish  two  papers  per  year 
compared  to  one  paper,  it  would  probably  not  be  twice  as  valuable 
if  the  researcher  were  to  piiblish  40  papers  per  year  as  opposed  to 
20.  It  certainly  would  not  be  40  times  as  valuable  if  the 
researcher  were  to  publish  40  papers  per  year  as  opposed  to  one 
paper  per  year.  Research  needs  to  be  done  to  identify  the  utility 
functions  for  these  indicators,  and  identify  the  regions  where  the 
linear  assumption  is  valid. 

One  rapidly  emerging  area,  for  which  substantial  databases  are 
in  existence,  is  patent  citation  analyses.  Yet  there  has  been 
negligible  use  of  this  capability  by  the  Federal  government  for 
research  impact  assessment,  and  assessment  of  the  conversion  of 
science  to  technology.  Studies  should  be  done  to  ascertain  the 
regions  of  validity  of  patent  citation  analysis,  and  the 
constraints  and  limitations  of  the  technique.  For  those 
technologies  and  research  disciplines  where  the  technique  has 
validity,  studies  should  be  done  using  patent  citation  analysis  to 
track  the  diffusion  of  research  information.  Perhaps  the  technique 
could  be  used  in  tandem  with  the  other  citation  approaches  in 
supplementing  the  retrospective  approaches  suggested  in  the  section 
on  proposed  semi-quantitative  studies.  It  would  be  valuable  to 
understand  the  parameters  which  influence  the  successful  conversion 
of  science  to  technology. 

A  number  of  specific  studies  are  suggested  for  large  multi¬ 
spectrum  Federally-supported  laboratories,  to  ascertain  whether 
these  organizations  are  making  effective  and  efficient  use  of  their 
multi-discipline  capabilities; 

1.  Examine  distribution  of  disciplines  in  co-authored  papers, 
to  see  whether  the  multidisciplinary  strengths  of  the  lab  are  being 
utilized  fully; 

2 .  Examine  distribution  of  organizations  in  co-authored 
papers,  to  determine  the  extent  of  lab  collaboration  with 
universities/  industry/  other  labs  and  countries; 

3.  Examine  nature  (basic/  applied)  of  citing  journals  and 
other  media  (patents),  to  ascertain  whether  lab's  products  are 
reaching  the  intended  customer (s) ; 

4 .  Determine  whether  the  lab  has  its  share  of  high  impact 
(heavily  cited)  papers  and  patents,  viewed  by  some  analysts  as  a 
requirement  for  technical  leadership; 

5.  Determine  which  countries  are  citing  the  lab's  papers  and 
patents,  to  see  whether  there  is  foreign  exploitation  of  technology 
and  in  which  disciplines; 

6.  Identify  papers  and  patents  cited  by  the  lab's  papers  and 
patents,  to  ascertain  degree  of  lab's  exploitation  of  foreign  and 
other  domestic  technology; 

7 .  Compare  the  lab's  output  (papers/  citations  normalized  over 
disciplines)  with  that  of  other  similar  institutions,  taking  into 
account  the  concerns  above  on  cross-discipline  normalization. 

The  production  function  approach  described  in  the  text 
[Averch,  1987,  1989]  essentially  regresses  desirable  research 
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outputs  (citations  per  dollar,  etc.)  against  research  inputs 
(quality  of  the  investigator's  department,  etc.).  One  potential 
application  is  prediction  of  high  output  proposals  based  on  prior 
knowledge  of  the  investigator  and  proposal  characteristics  (the 
research  inputs) .  This  could  be  a  useful  supplement  for  proposal 
peer  reviews,  especially  in  those  cases  where  quality  differences 
among  different  proposals  are  not  large,  and  use  of  prior  knowledge 
could  impact  the  outcome.  Studies  should  be  done  to: 

1.  identify  the  appropriate  output  measures; 

2.  identify  the  appropriate  input  measures; 

3.  estimate  the  production  functions  for  different 
disciplines; 

4.  provide  some  understanding  of  the  predictive  reliability  of 
the  approach. 

For  agencies  which  sponsor  some  accelerated  research  programs, 
or  which  have  the  charter  of  funding  accelerated  research  programs 
to  hasten  transitions,  marginal  cost-benefit  studies  of  the  type 
used  by  Mansfield  [1991]  should  be  made  to  study  the  research 
impacts.  Applications  of  these  approaches  to  the  early  stages  of 
basic  research  should  be  evaluated,  such  that  the  indirect  impacts 
of  basic  research  are  given  appropriate  credit  in  an  economic 
sense. 

For  mapping  the  structures  of  different  fields  of  science  and 
technology,  comparative  studies  should  be  done  of  co-word,  co¬ 
citation,  and  co-nomination  approaches,  and  hybrid  combinations  of 
these  co-occurrence  techniques.  There  should  be  synergistic 
benefits  from  the  hybrid  approaches,  since  different  complementary 
data  are  used  in  each  approach. 

For  the  full-text  co-word  analysis,  automated  data  analysis 
and  interpretation  techniques  should  be  developed  to  reduce  the 
labor  intensity  of  the  process.  The  full-text  technique  should  be 
applied  to  technical  journals  to  identify  emerging  research  and 
technology  areas,  as  well  as  the  evolving  structure  of  the 
technical  discipline.  For  example,  with  present  desktop  computer 
memory  capabilities,  full-text  co-word  analysis  could  be  applied  to 
one  or  more  year's  issues  of  the  Journal  of  the  American  Chemical 
Society  to  identify  the  emerging  research  areas  in  Chemistry,  and 
to  provide  some  understanding  of  the  inter-relationships  among  the 
different  areas  in  Chemistry  (and  perhaps  among  Chemistry  and  other 
discipline  areas  as  well) . 

V-D.  Database  Infrastructure  Development 

Research  is  the  pursuit  and  production  of  knowledge. 
Underpinning  research  is  the  generation,  flow,  synthesis,  and 
interpretation  of  information.  Central  to  the  assessment  of 
research  is  the  capability  to  handle  all  phases  of  the  information 
creation,  flow,  and  integration  cycle.  The  explosion  of  available 
information  in  the  last  decade  requires  the  utilization  of  large 
databases  to  handle  this  information  in  support  of  RIA. 

In  particular,  sophisticated  data  collection,  analysis,  and 
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interpretation  schemes  can  track  the  dissemination  of  information 
flowing  from  research  to  other  applications.  A  credible  research 
product  tracking  scheme  can  help  identify  the  indirect  impacts  of 
research  more  precisely,  and  can  improve  correlations  between 
research  evaluation  predictions  (such  as  peer  review  and 
bibliometrics)  and  downstream  impacts. 

Comprehensive  databases  describing  sponsored  research  and 
development  programs  in  many  funding  agencies  and  organizations, 
with  sophisticated  software  to  provide  rapid  access  to  the  database 
contents,  can  help  improve  the  selection,  management,  and 
evaluation  of  research  programs.  Research  gaps  can  be  identified, 
duplication  of  programs  can  be  minimized,  complementary  and  joint 
programs  can  be  established,  substantial  leveraging  of  other  agency 
programs  can  be  implemented,  and  technology  planning  can  be 
improved  with  better  awareness  of  maturing  research  programs. 

Tailored  databases  which  contain  information  about  the 
structural  relationships  among  projects  and  programs  can  help 
identify  critical  paths  for  development  in  R&D  programs.  This  is 
important  in  allocating  resources  among  programs  in  mission- 
oriented  agencies  and  other  organizations. 

Sophisticated  algorithms  for  manipulating  and  interpreting 
large  technical  textual  databases  would  allow  pervasive  themes  of 
the  databases  to  be  identified,  as  well  as  the  relationships  among 
the  themes  and  sub-themes.  Low  frequency  anomolous  relationships 
which  could  be  important  are  identified  easily  with  these 
techniques.  The  algorithms  would  also  allow  identification  of  the 
translations  between  research  areas  and  technology  areas  in  the 
databases,  and  would  provide  guidelines  and  roadmaps  for  increasing 
the  efficiency  of  searching  unfamiliar  databases. 

These  algorithms,  and  subsequent  analyses,  have  the  potential 
of  identifying  emerging  research  and  development  areas  contained 
within  the  databases  but  not  readily  discernable.  The  software  can 
also  help  in  taxonomy  construction,  with  the  taxonomy  elements 
obtained  'bottom-up'  from  the  database  language,  rather  than  top 
down  using  an  authoritative  directed  approach.  Many  different 
types  of  taxonomies  could  be  constructed  from  the  full  text 
database,  and  relationships  among  the  different  elements  of  the 
different  taxonomies  could  be  obtained.  Finally,  by  looking  at  the 
changes  in  the  structure  of  research  fields  over  time,  the  impact 
of  sponsoring  organization  intervention  can  be  ascertained. 

To  fully  understand  a  research  program,  especially  in  the 
assessment  of  that  program,  evaluators  must  be  cognizant  of  the  large 
body  of  research  being  conducted  throughout  the  world.  In  addition, 
to  fully  understand  the  impacts  of  research  on  different 
technologies,  evaluators  must  be  cognizant  of  the  large  body  of 
existing  and  developmental  technology  throughout  the  world,  and  the 
existing  and  potential  shortcomings  in  those  technologies. 

With  the  advent  of  high  speed  and  high  storage  capacity 
computers,  and  advances  in  database  software  packages,  the  capability 
exists  now  to  make  large  amounts  of  information  available  to 
researchers  and  evaluators.  In  particular,  the  capability  exists  to 
provide  information  about  funded  research  and  technology  development 
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programs  being  conducted  throughout  the  world,  as  well  as  information 
about  existing  technologies. 

Subsets  of  this  type  of  database  do  exist,  such  as  the  Federal 
multiagency  funded  research  programs  database  developed  by  the 
author.  This  multiagency  database  was  developed  to  identify  research 
gaps  in  the  larger  community;  to  guard  against  potential  duplication 
with  other  organizations;  to  identify  potential  complementary/ joint 
programs  with  other  organizations;  to  identify  other  agency  programs 
with  potential  for  leveraging;  and  to  identify  projected  maturing 
research  products  which  can  be  utilized  for  technology  planning. 

The  multiagency  database  provides  narrative  and  programmatic 
information  about  research  programs  sponsored  by  many  different 
agencies.  These  agencies  include  the  Department  of  Defense  (Army, 
Navy,  Air  Force,  BMDO,  Independent  R&D-funded  from  defense  contractor 
overhead) ,  and  non-DOD  Federal  agencies  (NIH,  NSF,  DOE,  NASA,  small 
business  innovative  research,  etc.).  Most  of  the  narrative 
descriptions  are  at  the  Work  Unit  (principal  investigator)  level,  but 
some  narrative  descriptions  (principally,  for  the  armed  services 
agencies)  are  at  the  program  level  (where  a  program  is  a  group  of 
principal  investigators) .  For  applications  where  linkages  between 
work  units  are  important,  program  level  narratives  are  more 
appropriate.  The  database  presently  resides  on  a  desktop  computer 
hard  disk,  but  could  be  accessed  directly  from  the  data  sources  via 
Internet  if  appropriate  drones  and  system  architectures  were 
installed.  This  latter  architecture  would  allow  the  data  which  the 
user  sees  to  be  more  current. 

Two  major  types  of  studies  have  been  performed  with  this 
database.  The  first  is  standard  text  retrieval  searches  to  identify 
programs  of  interest,  usually  in  categories  defined  by  the  end  user. 
The  second  type  of  study  (Database  Tomography)  involves  computational 
linguistics  techniques  to  extract  information  about  the  total 
database  structure.  These  techniques  include  multiword  phrase 
frequency  analyses  for  identifying  pervasive  research  thrust  areas, 
and  multiword  phrase  proximity  analyses  for  identifying  relationships 
among  thematic  research  areas.  These  computational  linguistics 
techniques  were  described  further  in  previous  sections. 

This  database  has  been  of  immense  help  in  assessing  research 
programs,  as  well  as  helping  to  plan  research  programs.  However,  a 
much  larger  and  more  comprehensive  database,  covering  not  only 
research  but  technology  as  described  above,  would  be  of  substantial 
benefit  to  the  research  and  technology  performer  community,  the 
research  and  technology  evaluation  community,  and  the  research  and 
technolo^  user  community.  Such  a  database  would  involve  the 
cooperation  of  many  government  agencies,  and  a  number  of  industrial 
organizations  as  well.  The  requirements  of,  and  planning  for,  such 
a  database  should  be  started  in  the  near  future. 

The  author's  multiagency  database  has  been  in  existence  for 
about  four  years,  and  his  proposals  for  an  expanded  database  as 
described  above  have  been  promulgated  for  almost  that  length  of  time. 
Recently,  a  major  step  towards  this  multiagency  database  goal  has 
been  taken.  The  Rand  Corp.  Critical  Technologies  Institute  has 
developed  a  multiagency  R&D  database  called  RADIUS.  It  provides 
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programmatic  descriptions  for  21  federal  agencies  at  five  different 
levels  of  resolution. 

As  stated  in  sections  V-A  and  V-B,  central  to  credible  work  in 
predicting  and  tracking  the  diffusion  of  information  from  research  is 
a  database  of  research  products  at  various  evolutionary  stages  which 
can  feed  the  predictive  models.  This  database  of  research  products 
could  be  linked  in  part  with  the  eibove-proposed  database  of  research 
and  technology.  Since  the  research  product  evolutionary  pathways 
transcend  the  research  originating  organization,  and  can  intersect 
all  societal  sectors,  the  cooperation  of  many  public  and  private 
organizations  would  be  required  to  develop  a  database  of  research 
products  in  their  evolutionary  stages.  Development  and  construction 
of  such  a  database  should  start  now. 

V-E.  General 

This  section  discusses  research  required  for  RIA  which 
transcends  any  particular  technique.  The  issues  addressed  are  those 
which  have  hindered  the  acceptability  of  the  RIA  product  for  decades. 

The  first  issue  addressed,  certification  of  RIA  managers,  is  as 
much  an  education  and  training  issue  as  a  research  issue.  Successful 
resolution  of  this  issue  would,  in  the  author's  estimation,  result  in 
a  major  advance  in  the  profession  of  RIA.  In  the  author's 
experience,  most  of  the  people  responsible  for  RIA  in  the  technical 
agencies  and  high-tech  industries  are  engineers  and  scientists  who 
have  converted  from  performing  engineering  and  science  to  assessing 
engineering  and  science.  Their  training  in  assessment  techniques 
ranges  from  minimal  to  non-existant.  Their  knowledge  of  the  breadth 
of  available  techniques,  and  when  to  apply  these  techniques,  is, 
except  for  a  few  notable  cases,  very  limited. 

Yet,  the  tools  available  for  research  assessment,  and  the 
conditions  under  which  these  tools  should  be  applied,  are  no-  less 
complex  than  the  analogous  diagnostic  tools  and  application 
conditions  available  to  an  M.D.  Internist.  In  fact,  the  research 
assessor's  operating  conditions  may  be  more  complex.  The  Internist 
typically  has  a  series  of  standard  protocols  to  follow  in  arriving  at 
a  diagnosis.  No  suite  of  standard  protocols  is  available  to  the 
research  assessor  today.  How  much  credibility  would  the  diagnosis  of 
an  Internist  have  if  the  Internist  had  training  in  his  discipline 
equivalent  to  that  of  the  average  research  assessor?  The  conclusion 
drawn  here  is  that  in  order  for  research  assessment  to  progress  from 
today's  practice  of  random  application  of  a  few  well-known  techniques 
to  tomorrow's  application  of  a  suite  of  more  sophisticated  approaches 
tailored  to  specific  problems,  the  people  responsible  for  research 
assessment  must  have  appropriate  training. 

Research  should  be  addressed  to  the  types  of  training  which 
would  offer  preparation  for  assessing  research  from  many 
perspectives.  What  are  the  elements  of  successful  research 
assessment,  and  what  are  the  educational  requirements  which  would 
lead  to  successful  research  assessment?  What  would  be  the  contents 
of  the  curricula;  where  would  it  be  off erred?  For  many  fields,  such 
as  Airline  Pilot,  Brain  Surgeon,  there  are  aptitude  and  personality 
prerequisites.  Are  there  similar  prerequisites  for  a  potentially 
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successful  research  assessor?  Finally,  how  should  certification  of 
research  assessors  be  done,  and  enforced? 

The  second  issue  addressed  is  research  assessment  quality.  In 
many  fields,  such  as  construction,  surgeiry,  music,  quality  of  the 
product  can  be  ascertained  readily  upon  inspection.  Yet  how  is 
quality  of  an  RIA  ascertained?  One  reads  papers  and  reports  which 
summarize  RIAs,  including  procedures  and  results.  From  these,  it  is 
almost  impossible  to  differentiate  high  from  mediiam  or  low  quality 
RIAs.  How  much  preparation  was  done  by  the  members  of  an  evaluation 
panel  before  the  actual  meeting?  How  much  background  work  did  their 
leader  do,  and  how  intense  was  his  probing,  and  consequently  that  of 
the  panel,  during  the  evaluation  process?  Was  free  discourse  during 
the  proceedings  encouraged,  or  suppressed? 

More  research  is  needed  into  what  constitutes  a  quality 
assessment.  It  is  important  to  understand  how  these  factors  can  be 
communicated  in  a  report,  and  how  they  can  be  identified  by 
independent  readers. 

The  third  generic  issue  is  that  of  motivation  and  associated 
incentives.  This  issue  has  some  overlap  with  the  previous  issue  of 
quality.  The  research  managers  and  administrators,  and  those  with 
responsibility  for  higher  level  oversight,  have  to  be  convinced  of 
the  value  of  RIA  to  their  organizations  for  the  improved  allocation 
of  research  resources.  More  important  than  any  evaluation  criteria 
selected  is  the  dedication  of  an  organization's  management  to  the 
highest  quality  objective  review,  and  the  associated  emplacement  of 
rewards  and  incentives  to  encourage  quality  reviews.  The  team 
assigned  responsibility  to  carry  out  RIA  must  be  motivated  to 
generate  the  highest  quality  product,  not  just  'answer  the  mail',  as 
is  done  in  many  organizations  today.  This  means  selecting  the  best 
suite  of  methods  available  to  accomplish  organizational  objectives, 
and  selecting  the  most  competent  and  objective  individuals  to 
participate  in  the  RIA.  The  RIA  managers  must  be  motivated  to 
examine  the  impact  from  as  many  perspectives  as  possible,  to  gain  the 
most  complete  understanding.  Finally,  the  objectives,  importance, 
and  benefits  of  RIA  must  be  articulated  and  communicated  to  the 
researchers  and  research  managers  at  the  initiation  of  RIA,  so  that 
the  reviewees  will  participate  in  the  RIA  as  fully  and  as 
cooperatively  as  possible. 

What  are  the  best  motivating  factors  for  producing  quality 
research  assessments?  What  are  the  best  incentives?  How  does  one 
insure  that  the  range  of  individuals  from  upper  management  to  the 
person  conducting  the  assessment  remain  motivated  throughout  the 
assessment  process  to  provide  the  highest  quality  product? 

The  final  generic  issue  addressed  is  frequency  and  level  of 
detail  of  RIA.  How  frequently  should  research  be  reviewed  from  a 
cost-effectiveness  viewpoint?  The  more  frequently  research  is 
reviewed,  the  more  chances  exist  to  identify  wayward  research  and 
redirect  the  efforts.  However,  as  was  shown  in  the  text,  costs  of 
research  reviews  are  not  negligible.  There  is  some  sort  of  optimum 
point  where  the  costs  of  performing  the  review  balance  the 
probability  of  achieving  cost  savings  by  identifying  and  re-directing 
or  terminating  wayward  research.  Research  is  required  to  determine 
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this  review  frequency,  as  a  function  of  discipline,  organization, 
level  of  basic  or  applied,  type  of  performer,  and  other  key 
parameters . 

At  what  level  of  organization  (i.e..  Principal  Investigator, 
Program,  Division,  Discipline,  etc.)  should  reviews  be  performed,  and 
at  what  frequency?  Should  the  same  RIA  approach,  or  combinations  of 
RIA  approaches,  be  applied  at  each  level  of  organization  with  the 
same  degree  of  intensity  and  effort?  Or,  should  the  suites  of  RIA 
techniques  and  review  frequencies  be  functions  of  the  level  of 
organization  being  reviewed?  These  are  key  issues  of  practical 
importance  on  which  negligible  amounts  of  research  have  been 
performed. 

VI.  RIA  -  SUMMARY  AMO  CONCLUSIOMS 

Three  generic  types  of  RIA  approaches  used  by  the  Federal 
government  were  described  (semi-quantitative,  peer  review  and 
quantitative  methods) .  Peer  review  is  the  method  used  most 
frequently.  All  methods  examined  have  their  unique  shortcomings.  A 
fundamental  problem  is  that  many  research  impact  targets  exist. 
These  include  impact  on:  research  field  itself;  allied  research 
fields;  technology;  systems;  operations;  education;  etc.  The 
strength  of  the  specific  impact  of  the  research  on  each  of  these 
targets  and  the  weighting  assigned  to  the  value  of  the  research 
impact  on  each  of  these  targets  depends  on  the  technical, 
organizational,  and  personal  perspectives  of  the  reviewers.  For 
example,  while  research  proposal  X  may  have  a  very  strong  potential 
impact  on  technology  Y  and  a  very  weak  impact  on  graduate  student 
education,  if  the  evaluators  selected  for  a  particular  review  are 
organizationally  and  personally  inclined  to  assign  high  importance  to 
graduate  student  education,  then  research  proposal  X  will  suffer 
accordingly.  The  many  available  dimensions  which  derive  from  these 
different  perspectives  serve  to  complicate  the  evaluation  process. 

Much  of  the  research  evaluation  community  has  come  to  believe 
that  simultaneous  use  of  many  techniques  is  the  preferred  approach. 
However,  there  is  little  evidence  of  multiple  technique  use  by  the 
Federal  government  in  impact  assessment,  especially  bibliometrics  to 
support  peer  review.  This  area  is  ripe  for  exploitation. 

A  recent  study  (Averch,  1990)  sximmarizes  quite  well  the  use  of 
research  impact  assessments  by  the  Federal  government.  "Since  1985, 
no  breakthrough  methods  of  any  variety  have  been  invented  that  more 
definitively  reveal  the  ex  post  scientific  or  social  value  of  past 
research  investments  . . .  the  evidence  is  sparse  that  there  is  much 
payoff  to  public  or  private  sector  R&D  administrators  from  making 
greater  use  of  them. . . .  R&D  administrators  do  use  ex  post  evaluations 
for  political  and  organizational  purposes,  for  example,  to  convince 
sponsors  that  they  are  interested  in  rational  decision  processes  and 
that  they  are  funding  good  work.  However,  the  research  evaluation 
literature  between  1985-1990  contains  very  few  demonstrations  that 
evaluation  makes  any  difference  at  all  to  the  critical  decisions 
about  the  level  and  allocation  of  scarce  scientific  and  technical 
resources. " 
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Finally,  this  Handbook  has  examined  different  research  impact 
assessment  techniques,  and  their  use  by  the  Federal  government.  The 
approach  has  been  to  describe  application  of  the  different 
techniques,  and  focus  on  the  strengths  and  weaknesses  inherent  in  the 
processes.  The  Handbook  did  not  address  the  predictive  reliability 
of  the  processes,  with  the  exception  of  a  short  section  on  the 
predictive  reliability  of  peer  review.  There  is  little  literature 
which  provides  the  basis  for  predicting  which  research 
programs/proposals  will  have  the  desired  downstream  impact.  For 
example,  the  relationship  between  a  proposal's  peer  review  score  or 
a  project's  bibliometric  rating  and  the  downstream  impact  on  an 
organization's  mission  is  not  addressed  in  published  studies.  One 
could  raise  the  question,  as  many  active  researchers  have,  as  to 
whether  there  is  value  to  any  of  these  assessment  techniques,  since 
their  predictive  value  is  unknown.  The  credibility  and 
predictability  of  these  assessment  techniques  are  ripe  topics  for 
research.  A  long  term  tracking  system  for  research  product  evolution 
would  be  required  to  gather  the  necessary  data.  The  system  would 
require  agreement  and  coordination  from  a  number  of  the  larger 
Federal  research  sponsoring  agencies,  and  maybe  from  industrial 
organizations  as  well.  While  such  a  system  would  not  provide 
absolute  answers,  since  tracking  of  the  informal  modes  of  knowledge 
communication  would  be  almost  impossible,  it  would  provide  a  much 
better  picture  of  research  impact  and  its  predictability  than  exists 
now.  With  the  present  state  of  information  storage  and  processing 
capabilities,  research  product  evolution  tracking  is  an  idea  whose 
time  has  come. 

VII.  RIA  OPTIONS  FOR  RESEARCH  SPONSORING  ORGANIZATIONS 

In  this  section,  the  research  evaluation  cmidance  recommended 
for  Federal  agencies  is  described.  While  the  focus  of  this  section 
is  on  Federal  agency  evaluations,  the  principles  and  implementation 
mechanics  are  sufficiently  broad  to  be  applicable  elsewhere. 

For  more  than  a  decade,  the  author  has  examined  the  research 
evaluation  practices  of  a  number  of  agencies  and  organizations.  This 
section  reflects  the  extraction  of  elements  of  the  best  of  those 
agency  and  organization  practices.  In  addition,  existing  and 
proposed  methods  described  in  prior  sections  are  included  with  the 
extracted  elements.  Managers  interested  in  applying  some  of  the 
recommended  approaches  should  tailor  them  to  the  unique  needs  of 
their  organizations,  to  ensure  that  these  assessment  procedures  are 
compatible  with  their  planning  and  execution  practices. 

For  ongoing  periodically  scheduled  reviews,  a  tri-level  agency 
research  evaluation  and  impact  assessment  procedure  for  continuing 
and  recently  completed  research  is  recommended.  The  main  criteria 
used  at  all  three  levels  are  research  quality  and  mission  relevance. 
The  highest  level  evaluation  examines  the  total  research  management 
organization  as  a  unit,  and  is  an  annual  corporate  level  evaluation 
using  external  reviewers.  The  next  level  evaluation  looks  at 
individual  programs  (where  a  program  is  a  collection  of  individual 
work  units  or  principal  investigators)  ,  and  is  managed  by  the 
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corporate  level  using  external  experts  to  perform  the  evaluation 
triennially.  An  option  is  presented  for  joint  evaluation  of  all 
agencies'  progreims  in  a  given  technical  discipline.  The  third  level 
evaluation  examines  the  individual  work  units,  and  is  run  less 
formally  by  the  program  managers  periodically  using  internal  and 
external  reviewers. 

RECOMMENDATIONS  FOR  FEDERAL  AGENCY  RESEARCH  EVALUATION 

A  national  perspective  of  science  and  technology  was  considered 
in  the  formulation  of  the  recommendations.  Publicly- funded  science 
and  technology  should  be  viewed  as  supporting  the  larger  long  range 
goals  of  the  S&T  sponsoring  agencies.  The  national  objective  for 
recommending  research  evaluations  should  be  to  ensure  that  that  a 
horizontally  (cross  agency)  and  laterally  (cross  discipline) 
integrated  national  research  program  will  lead  to  a  global 
optimization  for  achieving  the  aggregated  agency  long  range  goals. 
Fundamental  to  this  global  optimization  is  the  existence  on  a 
national  (and  perhaps  on  a  readily  accessible  international)  scale  of 
an  advanced  pool  of  high  quality  knowledge  in  many  research  fields. 
As  retrospective  studies  of  research  evolution  to  technology  have 
shown  [Kostoff,  1993d],  "an  advanced  pool  of  knowledge  must  be 
developed  in  many  fields  before  synthesis  leading  to  an  innovation 
can  occur.  The  real  critical  path  to  the  innovation  is  more  likely 
the  knowledge  pool  than  any  particular  entrepreneur".  Horizontal, 
lateral,  and  vertical  (cross  development  phases)  integration  issues 
impact  research  evaluation  procedures  relating  to  global 
optimization,  and  will  be  discussed. 

The  target  of  global  optimization  for  achieving  aggregated 
agency  long  range  goals  leads  to  two  top-level  requirements  which 
must  be  considered  in  formulating  research  evaluation 
recommendations.  Is  the  research  of  high  intrinsic  quality  and 
horizontally  and  laterally  integrated  among  the  funding  agencies  and 
balanced  across  the  different  disciplines  to  ensure  an  optimal 
national  pool  of  high  quality  knowledge,  and  is  the  research 
vertically  integrated  within  the  agencies  to  ensure  that  long  range 
agency  objectives  will  have  a  maximal  chance  of  being  impacted? 
Horizontal  and  lateral  integration  tend  to  be  associated  with  QUALITY 
(is  the  job  being  done  right?)  and  vertical  integration  with 
RELEVANCE  (is  the  right  job  being  done?),  with  the  ultimate 
assessment  issue  being  QUALITY-RELEVANCE  (is  the  right  job -being  done 
right?) . 


HORIZONTAL  COUPLING/  INTEGRATION 

Under  the  present  national  structure  of  pviblic  research 
sponsorship,  responsibility  for  funding  any  research  discipline  is 
divided  up  among  different  Federal  agencies.  Each  agency  focuses  on 
sponsoring  the  research  necessary  to  impact  the  agency's  unique  long 
range  objectives.  Because  of  the  unified  nature  of  research,  the 
different  components  of  a  research  discipline  funded  by  the  different 
agencies  are  related,  and  there  are  multiple  relationships  among 
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different  disciplines. 

From  a  national  perspective,  the  aggregated  research  components 
in  any  research  discipline  should  be  complementary.  There  should  be 
minimal  duplication,  and  there  should  be  minimal  gaps  in  the  research 
requirements  and  opportunities  addressed  for  the  funding  available. 
Thus,  there  should  be  some  measure  of  horizontal  coupling  among  the 
agencies  to  ensure  the  research  discipline  components  are 
complementary  on  a  national  scale. 

The  degree  of  horizontal  coupling  can  be  divided  into  three 
categories:  horizontal  awareness,  horizontal  coordination,  and 
horizontal  integration.  In  horizontal  awareness,  an  agency's 
research  managers  are  aware  of  other  agencies'  efforts  in  the 
discipline  and  plan  their  programs  accordingly,  but  there  is  no  joint 
planning,  execution,  or  evaluation  within  the  discipline.  In 
horizontal  coordination,  there  may  be  some  combination  of  joint 
planning,  execution,  and  evaluation  at  different  intensity  levels. 
In  horizontal  integration,  joint  efforts  are  strengthened  while 
allowing  each  agency  to  retain  autonomy  for  managing  the  research 
necessary  to  optimize  its  overall  objectives. 

LATERAL  COUPLING/  INTEGRATION 

From  the  national  program  perspective,  different  research 
disciplines  which  have  intrinsic  relationships  should  be  conducted 
and  managed  in  a  complementary  manner.  Thus,  there  should  be  some 
measure  of  lateral  (cross-discipline)  intra-  and  inter-agency 
coupling  to  ensure  that  intrinsically  related  disciplines  are 
complementary  on  a  national  scale. 

The  degree  of  lateral  coupling  can  be  divided  into  three 
categories:  lateral  awareness,  lateral  coordination,  and  lateral 
integration.  In  lateral  awareness,  research  discipline  managers  are 
aware  of  other  intra-  and  inter-agency  efforts  in  related  disciplines 
and  plan  their  programs  accordingly,  but  there  is  no  joint  planning, 
execution,  or  evaluation  among  the  related  disciplines.  In  lateral 
coordination,  there  may  be  some  combination  of  joint  planning, 
execution,  and  evaluation  of  related  disciplines  at  different 
intensity  levels.  In  lateral  integration,  joint  efforts  among 
related  intra-  and  inter-agency  disciplines  are  strengthened  while 
allowing  each  agency  to  retain  autonomy  for  managing  the  research  to 
optimize  its  overall  objectives. 

VERTICAL  COUPLING/  INTEGRATION 

Analogous  to  the  horizontal  and  lateral  coupling  categories  are 
vertical  coupling  categories.  While  the  main  focus  of  vertical 
coupling  is  within  a  given  agency,  vertical  coupling  can  transcend 
agencies.  Because  of  the  unified  nature  of  research,  products  of 
research  from  one  agency  can  transition  to  other  agencies'  programs. 
Thus,  planners  of  vertically  coupled  R&D  programs  in  one  agency  must 
be  continually  aware  of  existing  and  planned  R&D  programs  of  other 
agencies.  The  key  point  to  be  made  is  that  vertical  coupling  is  not 
independent  of  horizontal  or  lateral  coupling.  Vertical  integration 
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is  linked  with  horizontal  and  lateral  integration.  One  major  focus 
of  agency  research  assessment  from  the  national  perspective  should  be 
the  degree  to  which  DIAGONAL  INTEGRATION  (horizontal,-  lateral,  and 
vertical  integration)  is  being  achieved. 

The  vertical  coupling  categorization  is  vertical  awareness, 
vertical  coordination,  and  vertical  integration.  In  vertical 
awareness,  the  research  and  development  managers  are  aware  of  each 
other's  efforts  in  the  vertical  structure  and  plan  their  programs 
accordingly,  but  there  is  no  joint  planning,  execution,  or  evaluation 
within  the  structure.  In  vertical  coordination,  there  is  some 
combination  of  different  degrees  of  joint  planning,  execution,  and 
evaluation  within  the  vertical  structure. 

Vertical  integration  (VI)  in  an  S&T  program  is  a  linkage  among 
related  programs  in  different  phases  of  development.  Research  and 
development  programs  which  have  a  common  goal  are  run  as  a  unit. 
There  could  be  time  differences  and  lags  between  the  various 
programs,  or  they  could  be  run  with  different  degrees  of  concurrence. 
A  research  component  of  a  vertically  integrated  program  may  be 
undergoing  execution.  Its  development  component  may  be  in  the  early 
planning  stage,  with  execution  well  into  the  future.  Some  of  the 
higher  category  components  may  thus  exist  as  planning  wedges  while 
the  lower  category  components  are  being  executed.  The  development 
process  is  not  linear  because  of  the  inherent  feed-forward  and  feed¬ 
back  loops  within  and  among  categories.  As  Attachment  1  shows,  to 
achieve  total  VI,  the  program  has  to  be  planned  and  executed  in  a 
vertically  integrated  manner,  and  has  to  be  assessed  using  the  same 
taxonomy  as  was  used  for  planning  and  execution.  Because  a 
vertically  integrated  program  in  one  agency  could  draw  upon  programs 
managed  by  other  agencies,  the  vertical  linkages  operate  under  the 
constraint  that  each  agency  must  have  management  autonomy  to  ensure 
that  its  overall  objectives  are  met  in  the  most  expeditious  manner. 

ISSUES  OF  QUALITY  AND  RELEVANCE  ASSESSMENT 

The  issues  to  be  considered  in  evaluating  quality  and  relevance 
include  the  following.  Should  quality  and  relevance  be  evaluated  as 
a  unit,  or  evaluated  separately?  Should  quality  in  a  discipline  be 
evaluated  within  one  agency  only,  or  should  all  the  agencies 
sponsoring  a  discipline  be  evaluated  as  a  unit?  Should  the  quality 
evaluation  depend  on  the  type  of  vertical  coupling  in  an  agency? 
Should  these  issues  have  different  answers  depending  on  the 
evaluation  level  of  resolution  (agency,  program,  project)? 

SPECIFIC  RECOMMENDATIONS  FOR  AGENCY  RESEARCH  EVALUATION  GUIDANCE 

The  specific  recommendations  for  research  evaluation  and  impact 
assessment  guidance  to  Federal  agencies  will  now  be  presented.  The 
recommendations  should  be  viewed  as  a  threshold  level ,  and  the 
agencies  could  certainly  do  other  types  of  evaluations  or  more 
complex  variants  than  those  recommended.  IT  IS  RECOMMENDED  IN  THE 
NOMINAL  CASE  THAT  THE  AGENCIES  DO  A  THREE  LEVEL  RESEARCH  EVALUATION. 
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CORPORATE  LEVEL  REVIEW 

The  highest  would  be  a  corporate  level  review  of  how  the  agency 
performs  research.  If  the  agency  has  a  separate  research  unit,  then 
the  unit  should  be  evaluated  as  an  integrated  whole.  If  research  is 
vertically  integrated  with  development,  then  the  research  should 
preferably  be  evaluated  as  part  of  a  total  agency  R&D  review. 
However,  the  agency  should  have  the  option  to  evaluate  the  research 
unit  as  an  integrated  whole.  The  charter  of  this  highest  level 
assessment  would  be  to  review,  at  the  corporate  level,  general 
policy,  organization,  budget,  and  programs.  An  example  of  this  is 
the  corporate  NIST  review  [NIST,  1991a] . 

Total  inputs  and  outputs  would  be  examined.  Inputs  would 
include  overall  funding  and  people,  and  outputs  would  include  the 
different  types  of  research  products.  Integrated  bibliometric 
indicators  could  be  presented.  Examples  of  Hinds ight-type  recent 
downstream  impacts  could  be  shown,  as  well  as  results  of  macrolevel 
econometric  studies  related  to  research  benefits.  Overall  research 
management  processes  would  be  examined,  such  as  selection,  execution, 
review,  and  technology  transfer  of  research.  The  overall  investment 
strategy  which  drives  the  research  investment  would  be  evaluated,  and 
would  include  different  perspectives,  crosscuts,  and  breakdowns  of 
the  total  research  program,  such  as  technical  discipline  allocation, 
performer  allocation,  and  end  use  allocation  (see  Attachment  2  for  a 
more  detailed  discussion  of  the  corporate  investment  strategy) ,  The 
integration  of  the  research  objectives  with  the  larger  agency 
objectives  would  be  assessed.  Outstanding  corporate  level  issues 
would  be  addressed.  The  evaluators  would  include,  but  not  be  limited 
to,  representatives  of  the  stakeholder,  customer,  and  user  community 
whose  potential  conflicts  with  the  agency  are  minimal.  To  address 
horizontal  integration,  representatives  of  other  Federal  agencies 
would  be  included  as  evaluators. 

DISCIPLINE  REVIEW  AT  PROGRAM  LEVEL 
Program  Definition 

The  second  level  would  be  peer  review  of  a  discipline  or 
management  unit  at  the  program  level.  Fiscally,  a  program  is  a 
collection  of  components.  These  elements  could  be  subprograms, 
projects,  or  individual  work  units  (Pis) .  Conceptually,  a  program  is 
greater  than  the  sum  of  its  components,  just  as  the  living  .human  body 
is  greater  than  the  sum  of  its  component  molecules.  A  program 
includes  the  intelligence  or  inherent  logic  which  links  the 
components  to  each  other  and  to  the  program's  objectives,  just  as  the 
living  human  body  includes  the  intelligence  which  links  the  molecules 
to  each  other  and  to  the  homeostatic  operation  of  the  body.  A 
program  could  be  single  research  discipline  intra-  or  inter-agency; 
multiple  discipline  intra-  or  inter-agency;  multiple  discipline 
vertically  integrated  intra-  or  inter-agency;  multiple  discipline 
multi-agency  multi-national;  or  other  variants  of  the  above.  The 
nominal  program  is  assvimed  to  be  intra-agency;  multi-agency  programs 
are  discussed  later  in  this  Handbook.  The  nominal  review  is  assumed 
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to  be  intra-agency.  Some  organizations  review  by  disciplines,  some 
organizations  review  by  multi-discipline  management  unit,  and  in  some 
organizations  disciplines  coincide  with  management  units. 

Program  Review  Potions 

The  guiding  principle  for  review  options  is  that  evaluation 
should  occur  along  the  same  structures  and  taxonomies  by  which  the 
research  is  planned  and  executed.  If  the  agency  has  a  separate 
research  unit,  then  the  discipline  should  be  evaluated  as  an 
integrated  whole.  In  the  nominal  intra-agency  review,  quality  and 
relevance  could  be  evaluated  concurrently  or  separately,  as  desired 
by  the  agency. 

If  research  is  vertically  integrated  with  development,  then  the 
research  could  be  evaluated  as  part  of  a  total  vertical  structure  R&D 
review  (characteristics  of  an  assessment  of  such  a  vertically 
integrated  structure  are  discussed  in  Attachment  5)  or  as  part  of  the 
discipline,  as  desired  by  the  agency.  In  the  nominal  intra-agency 
review,  quality  and  relevance  could  be  evaluated  separately  or 
concurrently.  A  key  conclusion  to  be  drawn  from  this  paragraph  is 
that  research  evaluation  recommendations  must  take  into  account  how 
research  is  structured,  integrated,  and  managed  within  an  agency. 

Desirable  characteristics  of  a  high  quality  peer  review  were 
listed  previously,  and  are  repeated  in  Attachment  3.  The  review 
protocol  principles  suggested  for  this  level  were  listed  previously, 
and  are  repeated  in  Attachment  4.  The  research  programs  should  be 
reviewed  on  a  triennial  cycle,  based  on  the  DOE  BES  evaluation 
results  of  1982,  and  on  other  agency  practices. 

The  following  considerations  apply  to  a  concurrent  quality  and 
relevance  review.  The  reviewers  should  be  external,  have  minimal 
conflicts  with  the  program  being  reviewed,  and  should  be  selected 
with  expertise  in  all  facets  of  the  research  and  potential  impact 
areas.  To  evaluate  the  degree  of  horizontal  coupling  in  the  nominal 
intra-agency  review,  representative  of  other  Federal  agencies  should 
be  considered  as  reviewers,  or  at  least  should  be  invited  to 
participate  as  audience  members.  Thus,  the  review  panel  will  be  a 
heterogeneous  mixture  of  research  and  relevance  experts  who  can 
address  the  many  facets  of  the  science  and  areas  of  potential  impact. 
Approaches  for  selecting  a  review  panel  are  presented  in  Attachment 
6. 

In  the  nominal  concurrent  quality  and  relevance  review,  quality 
and  relevance  should  be  the  main  review  criteria.  Research  quality 
criteria  should  include  research  merit,  research  approach, 
productivity,  and  team  quality.  Relevance  criteria  should  include 
short  term  impact  (transitions  and/or  utility) ,  long  term  potential 
impact,  and  some  estimate  of  the  probability  of  success  of  attaining 
each  type  of  impact.  Some  issues  to  be  kept  in  mind  by  the  reviewers 
during  the  presentations  are  listed  in  Attachment  7. 

There  should  be  an  overview  showing  how  the  larger  management 
unit  (Division,  Department,  etc.)  in  which  the  programs  are  housed 
integrates  into  the  total  organization,  and  how  the  management  unit's 
objectives  relate  to  those  of  the  larger  organization.  Then,  the 
investment  strategy  of  the  larger  management  unit  should  be  presented 
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in  detail.  This  would  include  the  relative  program  priorities,  the 
actual  investment  allocation  to  the  different  programs,  and  the 
rationale  for  the  investment  allocation.  Finally,  for  each  program 
presentation,  the  investment  strategy  for  its  thrust  areas  should  be 
presented. 

The  investment  strategy  is  perhaps  the  most  crucial  part  of  a 
program  review,  and  deserves  further  discussion  here.  While 
investment  is  the  allocation  of  resources  among  the  program 
components,  the  investment  strategy  is  the  rationale  for  the 
prioritization  and  allocation  of  resources  among  the  program 
components.  The  optimal  investment  strategy  for  a  program,  which 
should  be  a  focal  point  of  an  assessment,  is  that  allocation  and 
rationale  which  will  produce  the  most  mission  relevant  high  quality 
research  for  impacting  the  program's  objectives.  This  will  depend  on 
the  viewpoint  of  the  assessor,  and  in  particular  how  the  assessor 
limits  the  role  of  the  research  within  the  national  perspective. 

The  optimal  investment  strategy  results  from  a  timely  confluence 
of  research  requirements  (top-down  driven)  and  promising  research 
opportunities  (bottom-up  driven) .  Further,  promising  research 
opportunities  result  from  a  timely  confluence  of  advances  in  theory, 
instrumentation,  new  experiments,  new  algorithms,  and  computers. 
Finally,  research  requirements  result  from  a  timely  confluence  of 
domestic  and  foreign,  political  and  economic,  strategic  and  tactical 
advances.  All  of  the  above  factors  should  be  included  in  a 
presentation  of  the  investment  strategy. 

While  the  emphasis  is  on  peer  review,  bibliometric  and  other 
type  of  indicators  should  be  utilized.  In  the  protocol,  it  is 
recommended  strongly  that  sufficient  background  material  be  supplied 
to  the  reviewers  before  the  review.  This  would  include 
organizational  descriptive  material,  narrative  descriptions  of .  each 
program  to  be  reviewed,  and  descriptive  material  of  each  work  unit  in 
the  program.  It  would  also  prove  useful  to  include  bibliometric 
output  indicators  for  each  program,  with  interpretive  analytical 
material.  This  could  include  refereed  papers,  patents,  awards  and 
honors,  presentations,  etc.  It  would  be  useful  to  include  narrative 
material  on  related  programs  in  other  agencies  and  industry.  It 
would  be  useful  to  include  Hindsight- type  results  of  research  that 
was  funded  years  ago  in  the  discipline  under  review  and  which 
recently  came  to  fruition  in  a  system  or  commercial  technology.  To 
track  these  in  a  credible  manner  would  require  the  research  product 
evolution  tracking  database  referred  to  previously.  This  concept  was 
described  by  the  author  in  a  little  more  detail  in  a  published  paper, 
and  is  reproduced  in  Attachment  8.  Sample  guidance  for  a  concurrent 
quality/  relevance  program  review  is  presented  in  Attachment  9. 

In  the  detailed  guidance  example  in  Attachment  9,  it  is 
recommended  that  program  managers  include  roadmaps  with  their 
technical  presentations.  It  would  be  very  valuable  if  the  roadmaps 
were  provided  as  background  material  as  well.  These  roadmaps  provide 
the  global  context  in  which  the  program  is  being  performed.  Their 
retrospective  components  show  how  aware  the  program  manager  is  of  the 
breadth  and  depth  of  the  intellectual  heritage  of  the  present 
program;  the  present  components  reflect  the  awareness  of  the  program 
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manager  of  the  wide  range  of  science  and  technology  areas  available 
to  complement  his  program,  and  the  degree  of  coordination  and 
leveraging  in  which  his  program  is  involved;  the  prospective  roadmap 
components  provide  indication  of  the  program  manager's  vision  and 
willingness  to  take  risks,  and  his  intrinsic  understanding  of  how 
results  from  other  science  and  technology  programs  could  be  exploited 
to  enhance  and  expand  the  potential  of  his  program.  A  certain  amount 
of  time  and  reflection  is  required  to  understand  and  fully  appreciate 
the  implications  of  a  comprehensive  roadmap,  and  the  reviewers  should 
receive  these  roadmaps  well  in  advance  of  the  actual  review  date. 
For  the  reader  interested  in  obtaining  more  information  about  diverse 
aspects  of  roadmaps,  a  comprehensive  document  has  been  prepared 
replete  with  concepts,  principles,  and  examples  [Kostoff,  1997d] . 

Finally,  although  the  following  concept  has  never  been  tested  to 
the  author's  knowledge,  it  would  be  valuable  to  incorporate  the 
results  of  journal  manuscript  reviews  in  the  research  program  peer 
review  process.  Attachment  19  outlines  the  benefits  of  such  a 
proposal,  and  outlines  how  it  could  be  accomplished. 

Inter-Aaencv  Review  Potion 

The  nominal  review  is  assumed  to  be  intra-agency.  If 
coordinations  could  be  effected,  a  more  useful  approach  from  the 
integrated  national  perspective  would  be  to  review  technical 
disciplines  on  a  national  level.  This  would  allow  horizontal 
integration  to  be  assessed  more  readily.  The  focus  of  this  inter¬ 
agency  review  would  be  research  quality. 

Under  this  option,  a  separate  intra-agency  review  would  address 
mission  relevance  of  the  technical  disciplines,  either  by  themselves 
or  as  part  of  vertically  integrated  structures.  For  purposes  of  the 
present  discussion,  it  is  assumed  that  the  mission  relevance  review 
for  each  technical  discipline  would  precede  the  inter-agency  research 
quality  review  for  that  discipline.  As  a  result,  the  research 
requirements  necessary  to  address  mission  needs  would  be  identified 
by  the  time  of  the  inter-agency  review. 

One  scenario  for,  say,  an  inter-agency  review  of  Chemistry  would 
be  the  following.  All  agencies  which  had  major  Chemistry  programs 
would  be  reviewed  together.  One  third  of  the  total  Chemistry 
discipline  would  be  reviewed  each  year.  A  large  panel  consisting  of 
research  experts  would  be  convened.  The  research  experts  would 
participate  in  all  program  reviews  except  where  obvious  conflicts  of 
interest  arose.  This  type  of  review  would  give  a  perspective  on  the 
national  program  unattainable  by  any  single  agency  review.  One  major 
outcome  would  be  identification  of  Chemistry  science  research  gaps  in 
the  national  program. 

One  complication  of  such  an  inter-agency  review  arises  from  the 
lateral  integration  issue.  For  a  program  which  is  planned  and 
executed  as  a  multi-category  (R&D)  and/or  multi-research  discipline 
program,  is  it  reasonable  to  isolate  a  particular  research  discipline 
and  have  it  reviewed  for  quality  as  part  of  a  multi-agency  review? 
If  a  multi-category  program  had  a  separate  relevance  review  within 
the  agency,  then  a  detailed  quality  review  of  each  of  the  categories 
could  and  should  be  performed.  For  research  which  is  conducted  as 
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part  of  a  multi-disciplinary  program,  the  research  quality  of  the 
multi-disciplinary  unit  should  be  examined.  This  would  require  that 
the  research  expertise  on  the  panel  not  be  confined -to  the  major 
discipline  being  evaluated,  but  should  include  experts  from  the  other 
research  areas.  This  is  true  even  for  intra-agency  multi-discipline 
reviews.  For  the  inter-agency  nominal  single  discipline  reviews 
suggested,  integrated  multi-discipline  reviews  would  require  that 
experts  from  disciplines  other  than  the  dominant  discipline  be 
brought  in  for  specific  multi-disciplinary  programs.  While  this 
would  make  the  logistics  of  the  inter-agency  review  more  complex,  if 
programs  with  similar  multi-discipline  profiles  are  grouped 
appropriately,  the  review  could  be  performed  more  efficiently. 

If  this  full  multi-discipline  program  review  proves  to  be  too 
complex  because  of  time  and  reviewer  logistics  constraints,  then  only 
the  single  discipline  component  of  the  multi-discipline  program  could 
be  reviewed  at  the  inter-agency  single  discipline  review.  The  other 
disciplines  of  the  multi-discipline  program  would  be  summarized  to 
provide  context  for  the  single  discipline  component  being  reviewed, 
but  they  would  not  be  subject  to  review  at  this  time.  They  would  be 
reviewed  when  their  own  single  discipline  inter-agency  reviews 
occurred.  If  an  objective  of  research  assessments  on  a  national 
scale  is  to  ascertain  whether  programs  are  complementary  from 
horizontal,  lateral,  and  vertical  perspectives,  then  there  will  be 
many  complexities  of  the  type  just  described  which  will  have  to  be 
addressed.  Having  many  different  types  of  integrated  programs  will 
require  special  conditions  and  creativeness  to  review  each  type. 

Continuing  on  the  inter-agency  Chemistry  review  example,  at  the 
end  of  the  inter-agency  review,  the  panel  would  meet  to  identify 
promising  research  opportunities  in  the  Chemistry  discipline.  The 
panel  would  be  well-positioned  to  generate  these  opportunities,  for 
they  would  have  heard  the  total  program,  and  seen  the  research  gaps. 
They  would  provide  recommendations  to  the  agency  program  managers  at 
the  end  of  their  meeting  for  the  promising  Chemistry  research 
opportunities  to  be  pursued. 

Immediately  following  this  panel  meeting,  the  Chemistry  program 
managers  from  the  agencies  would  convene  a  joint  planning  meeting. 
With  the  research  requirements  based  on  mission  needs  and  the 
research  opportunities  based  on  science  needs  in  hand  from  the  intra- 
and  inter-agency  reviews,  the  panel  would  outline  the  structure  of  a 
complementary  horizontally- integrated  multiagency  Chemistry  program. 
This  very  preliminary  structure  would  have  to  be  iterated  within  each 
agency  over  a  period  of  months  to  ensure  coordination  with  other  R&D 
programs  within  the  agency  and  to  ensure  funding  priorities  and 
constraints  are  observed.  Nevertheless,  this  initial  structure  would 
provide  a  cohesive  and  comprehensive  springboard  from  which  the  final 
integrated  programs  could  be  developed. 

If  this  whole  process  is  scheduled  well,  then  immediately 
following  the  planning  meeting  would  be  the  annual  American  Chemical 
Society  meeting.  The  overall  results  of  the  intra-  and  inter-agency 
Chemistry  reviews,  the  research  requirements  and  opportunities 
identified,  and  the  very  preliminary  program  plans  would  be  presented 
to  the  Society  at  the  opening  plenary  session.  This  could  last  a 
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day,  and  valuable  feedback,  dialogue,  and  interchange  could  be 
generated.  The  above  process  has  the  desired  feature  of  integrating 
planning  and  assessment,  and  brings  in  the  total  community  as  process 
participants. 


DISCIPLINE  REVIEWS  AT  WORK  UNIT  LEVEL 

The  third  level  of  review  would  be  at  the  work  unit  (PI)  level. 
The  nominal  case  would  be  less  formal  intra-agency  reviews  conducted 
at  the  program  manager  level,  but  if  an  agency  wanted  a  more  formal 
review  conducted  at  the  corporate  level,  such  as  the  DOE  BES  review, 
they  would  certainly  have  the  prerogative  to  do  so.  If  an  agency 
wanted  this  more  formal  work  unit  level  review,  then  it  would  be  most 
efficient  to  combine  it  with  a  program  level  review.  Most  of  the 
program  level  issues  described  above,  including  the  potential  for 
multiagency  reviews  at  more  focused  workshops  or  conferences,  are 
applicable  and  need  not  be  repeated. 

An  example  of  a  high  quality  well-documented  and  established 
approach  for  evaluating  existing  work  units  is  the  procedure 
developed  by  DOE.  Its  protocol  is  reproduced  in  Attachment  11. 

This  procedure  can  be  modified  for  evaluating  proposed  projects, 
and  existing  and  proposed  programs.  The  author  participated  in  the 
development  of  the  DOE  process,  and  modified  it  for  program 
evaluation  when  he  came  to  ONR.  In  the  modification,  the  author 
served  as  Chairman  of  the  review  panels,  received  individual  inputs 
from  the  panelists,  and  wrote  the  final  report.  The  panels  were 
typically  larger  than  the  DOE  panels,  averaging  about  a  dozen  people 
in  size,  and  had  more  representation  from  the  customer,  stakeholder, 
user,  and  impactee  communities.  Some  of  the  review  criteria  and 
definitions  were  modified. 

Attachments  12-15  contain  sample  evaluation  scoring  forms,  used 
by  the  author  in  the  evaluation  of  existing  and  proposed  programs. 
Attachment  12  is  a  long  form  used  in  the  evaluation  of  existing 
programs.  The  long  form  has  two  purposes.  It  requires  the  reviewer 
to  consider  quantitatively  the  different  components  of  quality  and 
relevance.  It  also  allows  the  sponsoring  agency  to  perform  analyses 
on  the  scores,  to  identify  which  component  criteria  the  reviewers 
thought  were  most  important.  Attachment  16  contains  an  analysis  of 
reviewers'  scores  for  proposed  programs  (see  Kostoff  [1992a] -Appendix 
II  for  a  more  detailed  treatment) . 

If  an  ad  hoc  review  of  an  existing  program  is  required,  where 
there  are  specific  issues  of  concern  rather  than  the  generic  issues 
characteristic  of  periodically  scheduled  reviews,  then  a  different 
evaluation  form  type  may  be  used.  Attachment  17  contains  the 
criteria  and  forms  that  could  be  used  for  ad  hoc  evaluation  of  a  more 
applied  research  program.  Here,  the  focus  is  on  the  specific  issues 
of  concern.  Ratings  could  be  applied  to  each  issue;  they  are  not 
shown  in  this  attachment.  The  issues  shown  in  Attachment  7  would 
also  be  utilized  in  this  review. 

Many  labs,  companies,  and  sponsoring  agencies  have  special 
programs  which  consist  of  many  small,  high  risk,  finite  duration 
projects.  These  'seed-money'  projects,  because  of  their  special 


182 


nature  and  small  size,  require  special  review  techniques  for  cost- 
effective  assessment.  A  protocol  for  reviewing  these  types  of 
projects  is  contained  in  Attachment  18.  To  streamline  these  reviews, 
expanded  use  of  journal  papers  peer  reviewers'  comments  during 
project  reviews  is  proposed  in  Attachment  19. 

MULTI-AGENCY  PROGRAM  ASSESSMENTS 

The  previous  section  dealt  mainly  with  intra-agency  programs. 
There  are,  however,  programs  which  are  inter-agency,  and  they  can  be 
fairly  large  programs.  If  these  programs  are  planned  as  a  unit,  they 
should  be  assessed  as  a  unit.  The  issues  which  were  discussed  for 
program  evaluation  in  the  previous  sections,  where  intra-agency 
programs  were  considered  as  the  evaluation  unit,  apply  here  equally 
well  to  the  inter-agency  program  considered  as  the  evaluation  unit. 
These  issues  include  concurrent  or  separate  quality  and  relevance 
reviews,  and  intra-  or  inter-agency  quality  reviews. 

OTHER  CONSIDERATIONS 

High  quality  assessments  at  any  of  the  above  levels  are 
expensive  and  time  consuming.  Attachment  10  contains  a  peer  review 
cost  example  presented  previously.  The  agencies  should  make  every 
effort  to  minimize  program  disruption  and  review  costs  while 
performing  credible  assessments. 

A  practical  consideration  concerns  the  length  of  the  review.  It 
is  desirable  to  have  the  same  group  of  reviewers  present  for  the 
total  review  of  the  areas  in  which  they  have  expertise.  This  allows 
normalization  and  continuity  to  occur.  However,  in  the  case  of  a 
program  review,  the  larger  the  program,  the  more  review  time  it  will 
require.  It  becomes  more  difficult  to  retain  high  quality  reviewers 
as  the  length  of  the  review  increases. 

There  are  at  least  three  approaches  to  circvimvent  this  problem. 
First,  the  program  could  be  broken  into  focused  subprograms,  and  each 
subprogram  could  be  reviewed  separately  with  more  focused  experts. 
Second,  the  program  could  have  its  components  aggregated,  and  the 
full  program  could  be  reviewed  by  the  same  panel  at  a  lower  level  of 
detail.  Third,  the  quality  and  relevance  components  could  be  divided 
for  separate  reviews.  In  the  inter-agency  review  exzunple  for 
Chemistry  described  edsove,  if  the  review  required  weeks,  then  more 
focused  reviews  of  smaller  units  might  be  appropriate. 

The  length  of  the  review  will  be  governed  by  the  desired 
resolution  detail  of  the  technical  area  presentations.  Two 
indicators  are  of  value  in  the  discussion  of  resolution  detail. 
These  are  Spatial  Presentation  Intensity  (SPI)  and  Temporal 
Presentation  Intensity  (TPI) .  The  SPI  is  the  ratio  of  total  dollar 
value  of  the  program  being  reviewed  to  the  number  of  reviewers,  and 
the  TPI  is  the  ratio  of  total  dollar  value  of  the  program  being 
reviewed  to  total  hours  allotted  to  the  review. 

For  the  most  detailed  review,  a  review  at  the  Principal 
Investigator  (PI)  level,  the  TPI  should  range  from  about  $125K  to 
$250K  per  hour  (one  to  two  projects  per  hour)  ,  and  the  SPI  should 
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range  from  about  $100K  to  $250K  per  reviewer.  These  reviews  could 
cover  technical  quality  and  agency  relevance.  For  the  second  level 
detail  of  review,  a  program  review  which  would  cover  both  in-depth 
technical  quality  and  agency  relevance,  both  the  SPI  and  TPI  should 
range  between  $1M  and  $1.5M  ($/reviewer,  $/hour) .  The  third  level 
detail  of  review,  a  program  review  which  would  be  a  presentation 
aggregation  of  the  second  level  of  review  and  would  cover  agency 
relevance  only,  would  have  both  the  SPI  and  TPI  range  between  $4M  and 
$5M  ($/reviewer,  $/hour) .  The  TPI  estimates  are  based  on  review 
durations  of  one  or  more  days,  while  the  SPI  estimates  are  based  on 
one-day  reviews.  If  the  same  reviewers  are  used  for  multi-day 
reviews,  the  SPI  nximbers  increase  sharply.  Thus,  if  an  agency  wanted 
to  do  an  in-depth  technical  quality  and  agency  relevance  review  at 
the  program  level  of  a  $50M  program,  then  about  35-50  hours  of 
presentation  time  would  be  required.  If  a  different  panel  were  used 
each  day,  then  about  35-50  reviewers  would  be  required,  whereas  if 
the  same  panel  were  used  for  the  total  review,  then  realistically 
about  ten  reviewers  would  be  required. 

Many  agencies  do  quantitative  evaluations  of  their  research. 
The  actions  they  take  appear  to  be  on  the  basis  of  the  qualitative 
results,  but  it  is  not  clear  how  use  is  made  of  the  quantitative 
results.  A  method  for  translating  the  quantitative  scores  into 
funding  reallocations  on  a  uniform,  consistent  agency-wide  basis, 
making  use  of  existing  computer  hardware  and  software  capabilities, 
is  presented  in  Attachment  20. 

Finally,  there  is  considerable  interest  at  present  in  expanding 
the  use  of  quantitative  indicators  of  research  impact.  While  the 
quantitative  methods  described  in  the  first  part  of  this  Handbook 
focus  on  the  magnitude  of  the  indicators,  there  do  not  appear  to  be 
methods  which  attempt  to  quantify  the  patterns  which  underly  the 
indicator  magnitudes.  For  example,  citation  counts  are  tabulated  in 
citation  analysis,  but  few,  if  any,  studies  address  the  patterns  of 
citation  impact  on  different  fields,  journals,  institutions,  etc. 

One  concept  which  has  been  used  in  statistical  thermodynamics 
and  information  theory  to  quantify  and  analyze  patterns  is  entropy. 
Attachment  21  addresses  the  potential  use  of  entropy  and  entropy 
gradients  in  different  aspects  of  research  evaluation  and  impact 
assessment.  Two  examples  are  presented,  and  it  is  shown  that  in  some 
cases  supplementation  of  entropy  with  indicators  such  as  moments  of 
the  pattern  distribution  functions  is  quite  useful. 
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Attachment  1  -  REQUIREMENTS  FOR  VERTICAL  INTEGRATION 

For  total  vertical  integration  to  be  achieved,  two  conditions 
are  required.  Both  intrinsic  vertical  integration  and  extrinsic 
vertical  integration  are  necessary.  Intrinsic  vertical  integration 
occurs  when  program  planning  and  execution  have  been  done  in  a 
vertically  integrated  manner,  and  the  vertical  integration  goals  have 
been  achieved.  Extrinsic  vertical  integration  is  the  verification  of 
intrinsic  vertical  integration.  Extrinsic  vertical  integration 
results  from  positive  assessment  of  a  program  using  coordination  and 
integration  measurement  criteria. 

There  are  many  aspects  to  bringing  a  vertically  integrated 
program  to  fruition,  including  planning,  execution/  transition,  and 
assessment.  Unless  planning,  execution,  and  assessment  are 
coordinated  among  each  other,  among  performing  organizations,  and 
among  levels  of  development,  there  is  little  chance  that  an  agency's 
total  program  will  be  performed  and  perceived  in  a  vertically 
integrated  mode. 

If  a  program  is  not  planned  and  executed  as  vertically 
integrated  (not  intrinsically  VI) ,  it  will  not  be  verified  as 
vertically  integrated  by  assessment  (not  extrinsically  VI) .  If  a 
program  is  planned  and  executed  as  vertically  integrated 
(intrinsically  VI)  ,  it  may  or  may  not  be  verified  as  vertically 
integrated  by  assessment.  The  outcome  will  depend  on  how  the 
assessment  is  performed.  To  achieve  extrinsic  VI  in  this  case, 
planning,  execution,  and  assessment  have  to  be  based  on  the  same  or 
a  readily  connected  taxonomy  (A  taxonomy  is  a  classification  scheme) . 
In  particular,  if  planning  is  done  using  one  taxonomy,  and  assessment 
is  done  using  an  unrelated  taxonomy,  then  the  assessment  results  will 
be  predetermined  to  show  an  uncoordinated  and  non- integrated  S&T 
effort.  Thus,  to  achieve  total  VI,  the  program  has  to  be  planned  and 
executed  in  a  vertically  integrated  manner,  and  has  to  be  assessed 
using  the  same  taxonomy  as  was  used  for  planning  and  execution. 
Because  a  vertically  integrated  program  in  one  agency  could  draw  upon 
programs  managed  by  other  agencies,  the  vertical  linkages  operate 
under  the  constraint  that  each  agency  must  have  management  autonomy 
to  ensure  that  its  overall  objectives  are  met  in  the  most  expeditious 
manner . 


185 


Attachment  2  -  CORPORATE  INVESTMENT  STRATEGY 

The  investment  strategy  for  an  agency  should  be.  an  iterative 
procedure  which  converges  when  the  'top-down'  requirements  driven 
component  synchronizes  with  the  'bottom-up'  opportunities  component. 
The  'top-down'  component  of  the  investment  strategy  document  should 
start  by  describing  the  generic  principles  which  guide  the  agency's 
investments.  It  should  then  summarize  the  major  global  and  domestic 
events  which  influence  the  direction  of  the  agency's  investments. 
This  provides  the  context  in  which  the  actual  investment  strategy  is 
generated.  Then  the  prioritized  mission  requirements  are  specified. 
A  translation  is  made  from  these  prioritized  mission  requirements  to 
a  prioritized  set  of  research  and  technology  requirements. 

The  above  procedure  constitutes  the  first  step  in  the  'top-down' 
component.  In  parallel,  the  'bottom-up'  component  is  developed.  The 
research  areas  in  which  the  agency  has  strategic  interest  should  be 
assessed.  The  confluence  of  leading-edge  research  theory, 

experiments,  computational  capabilities,  and  instmimentation  should 
be  identified  to  provide  the  most  timely  promising  research 
opportunities  of  interest  to  the  agency. 

The  technical  program  areas  which  address  these  research  and 
technology  recruirements  and  opportunities  are  developed,  and  the 
priorities  among  these  areas  are  established.  Funding  allocations 
among  these  areas  are  made,  and  the  rationale  for  the  funding 
allocations  based  on  these  priorities  is  provided  in  detail. 

SAMPLE  INVESTMENT  PRINCIPLES 

The  major  strategic  generic  investment  principles  which  guide 
the  agency's  investments  can  be  svimmarized  in  the  following  manner: 

1.  Maintain  diversied  portfolio  of  broad  spectmim  R&D  to  deal 
with  future  uncertainties,  develop  breakthroughs,  and  rapidly  exploit 
foreign  breakthroughs 

2 .  Allocate  X%  of  the  investment  to  high-risk  high-payoff 
programs 

3.  Allocate  Y%  of  the  investment  to  requirements-driven  programs 

4.  Provide  stable  funding  to  maintain  program  integrity 

5.  Focus  investment  on  medium  to  long-term  payoffs 

6.  Leverage  other  agency/government  programs  when  possible 

7.  Maintain  awareness  of  other  agency/government  S&T  programs 
when  selecting,  reviewing,  and  terminating  S&T  programs 

8.  Use  external  reviewers  from  the  larger  technical  community 
when  selecting  and  reviewing  agency  technical  programs 

9.  Plan,  execute,  and  review  agency  technical  programs  to 
maximize  vertical  and  lateral  integration 

10.  Support  training  of  future  agency  mission  essential 
workforce 

11.  Maintain  critical  industrial  base  which  could  be  activated 
in  times  of  national  emergency 

12 .  Maintain  threshold  S&T  infrastructure  which  focuses  on 
essential  S&T  programs  and  provides  an  in-house  window  to  the  larger 
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technical  community 

INVESTMENT  STRATEGY  CROSSCUTS  AND  BREAKDOWNS 

In  tandem  with  the  principles  stated  eibove  is  the  need  to  show 
how  the  principles  are  expressed  in  actual  funding  allocations.  To 
display  this  coupling,  a  nximber  of  different  crosscuts  through  the 
program  are  necessary.  A  few  of  these  crosscuts  are  suggested  below. 

CROSSCUTS 

1.  Science  Disciplines 

2 .  Mission  Areas 

3 .  Core  Competencies 

4 .  Claimants 

5 .  Performers 

6 .  Revolutionary/  Evolutionary 

7.  Basic/  Applied 

8.  Level  of  Risk** 

9.  Congressional/  Administration  Priorities/  Thrusts 

10.  Multi-Agency  Applications 

11.  Coordination  with  other  Agencies/  Organizations/  Countries 

The  crosscut  with  the  double-asterisk**.  Level  of  Risk,  is  very 
important  in  the  evaluation  of  Federal  programs  and  is  typically  very 
difficult  for  agencies  to  evaluate.  The  remainder  of  this  section 
discusses  the  importance  of  identifying  research  risk,  presents 
results  of  experiments  in  categorizing  program  risk,  and  shows  how 
these  categories  can  be  utilized  to  support  investment  decisions. 

BACKGROUND 

Investment  in  research  projects  is  intrinsically  a  risky 
venture.  Not  only  is  there  the  technical  risk  associated  with  the 
research  approach  and  the  downstream  technology  development 
uncertainties,  but  there  are  risks  associated  with  the  market  and 
geopolitical  environments  which  affect  the  financing  and  eventual 
utilization  of  the  resultant  technologies.  A  full  risk  analysis  for 
purposes  of  research  project  selection  and  continuation  should 
incorporate  all  these  risk  factors. 

The  present  section  focuses  on  identification  of  technical  risk 
of  projects  in  their  research  phase.  In  particular,  it  focuses  on 
one  component  of  technical  risk,  the  research  technical  risk. 
Research  risk  is  defined  as  the  probability  that  pre-defined  research 
objectives  will  be  attained  at  a  specified  cost  at  a  specified  time. 
This  risk  component  is  typically  very  difficult  for  any  research 
sponsoring  organization  to  estimate.  To  develop  a  more  rigorous  and 
normalized  approach  to  research  technical  risk  level  estimation,  the 
author  has  examined  hundreds  of  research  program  narratives  in 
conjunction  with  the  program  managers,  and  has  utilized  different 
approaches  to  classify  the  research  program  risk  levels.  These 
experiments  with  categorizing  risk  have  shown  that  the  following 
seven  major  risk  categories  provide  a  comprehensive  taxonomy. 
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1)  Large  extrapolations; 

2)  Difficult  data  gathering; 

3)  Additional  complexities  required; 

4)  High  precision  required; 

5)  Breakthroughs  required; 

6)  Theories  insufficient; 

7)  Record  of  past  difficulties. 

INTRINSIC  DIFFERENCES  BETWEEN  HIGH  AND  LOW  RISK  PROGRAMS 

The  high-risk  programs  differ  from  the  low-risk  programs  because 
larger  steps  into  the  unknown  are  required  to  achieve  the  research 
objectives.  The  basic  mechanisms  which  would  allow  the  attainment  of 
the  research  objectives  for  high-risk  programs  have  greater 
uncertainty  than  for  low-risk  programs.  There  is  some  overlap  among 
these  risk  categories,  many  of  the  programs  could  be  placed  in  more 
than  one  of  the  categories,  and  some  of  the  categories  could  be 
subsumed  under  other  categories.  In  particular,  category  7  (Record  of 
past  difficulties)  could  be  included  in  many  of  the  other  categories, 
as  could  category  5  (Breakthroughs  required) .  Nevertheless,  a 
discussion  of  all  seven  of  these  categories  provides  a  deeper 
understanding  of  research  risk  than  is  possible  from  the  definitions 
alone. 


SEVEN  MAJOR  CATEGORIES  OF  RISK 

1)  Large  extrapolations 

In  this  category,  the  risks  arose  from  the  large  extrapolations 
required  to  proceed  from  present  levels  of  understanding  to  the 
attainment  of  specified  research  targets.  There  could  be 
extrapolations  in  size  (e.g.,  achieve  plasma  stabilities  over,  long 
distances  after  stabilities  over  short  distances  have  been  obtained) , 
and/or  in  time  (extend  energy  transfer  measurements  from  the 
nanosecond  regime  to  the  femtosecond  regime) ,  and/or  in  operational 
parameters  (understand  material  response  in  extended  portions  of  the 
frequency  spectrum) ,  and/or  in  geographical  regions  (take 
measurements  in  hostile  North  Atlantic  environment  after  measurements 
in  warmer  latitudes  have  been  completed) . 

2)  Difficult  data  gathering 

The  risks  in  this  category  arose  from  the  difficulty  of 
obtaining  useful  data.  The  data  gathering  environment  could  be  harsh 
and  hostile  (e.g.,  environmental  measurements  during  space  vehicle 
atmospheric  re-entry) ,  and/or  the  signal  to  noise  ratio  could  be 
small,  and/or  the  instrument  resolution  might  not  be  sufficient, 
and/or  the  experiments  could  be  very  expensive  (e.g.,  winter 
experiments  in  the  Arctic) .  In  addition,  the  data  required  could  be 
very  sporadic  (fast-rising  storm  data),  the  appropriate  variables  to 
measure  might  be  unknown,  the  instruments  may  have  been  unproven  in 
the  required  operational  environment  (e.g.,  an  instrument  which 
worked  well  in  a  laboratory  is  now  required  to  operate  in  a  much 
harsher  aircraft  environment) ,  and  the  instruments  may  not  have  been 
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sufficiently  ruggedized  for  the  operational  environment  to  provide 
useful  data.  In  some  cases,  the  scales  of  the  data  to  be  taken  might 
not  be  known;  there  may  be  questions  as  to  whether  the  correct 
instruments  were  being  utilized.  Finally,  the  volumes  of  data 
required  for  a  useful  experiment  may  be  beyond  the  capacities  of 
present  instruments,  and  in  this  case,  as  well  as  in  many  others, 
instrument  breakthroughs  may  be  required. 

3)  Additional  complexities  required 

In  this  category,  the  incorporation  of  different  types  of 
complexities  in  the  research  problem  greatly  increased  the 
uncertainty  of  its  resolution.  Issues  arising  from  the  different 
component  disciplines  might  require  simultaneous  resolution;  i.,  e. , 
multiple  discipline  advances  might  be  required,  and  coordinated  teams 
of  experts  from  multiple  disciplines  might  be  required.  There  could 
be  uncertainty  as  to  whether  all  relevant  processes  were  being 
modeled.  Multi-scale  interaction  and  coupling  mechanisms  might  not  be 
known.  In  some  cases,  the  complexity  of  the  problem  could  verge  on 
being  unmanageable.  Because  of  the  multidiscipline  interrelationships 
in  some  cases,  there  could  be  questions  as  to  whether  one  objective 
of  the  multidisciplinary  effort  could  be  attained  without 
compromising  other  objectives. 

4)  High  precision  required 

This  risk  factor  had  two  components.  First,  the  data  accuracy 
controls  could  be  too  demanding.  For  example,  a  system  of  many 
elements  could  require  unacceptably  high  operational  tolerances  for 
each  component  before  the  overall  system  could  operate  as  predicted. 
Second,  the  process  data  could  be  sufficiently  variable  that 
predictive  models  could  not  be  constructed  to  produce  reasonable 
estimates. 

5)  Breakthroughs  required 

This  factor  is  the  quintessence  of  high  risk,  especially  if 
breakthroughs  resulting  in  orders  of  magnitude  improvement  are 
required,  and  permeates  the  other  risk  factors.  Types  of 
breakthroughs  which  could  be  required  include  totally  new  materials, 
new  numerical  methods,  and  new  computing  capabilities. 

6)  Theories  insufficient 

This  factor  concerns  the  upgrading  of  present  theories.  The 
present  theories  might  not  be  sufficient  to  incorporate  the 
additional  phenomena.  Theories  might  be  required  to  incorporate 
phenomena  which  were  heretofore  unmodeled.  There  may  be  insufficient 
understanding,  or  identification  of  all  process  phenomena,  required 
to  produce  a  universal  theory  or  approach.  Theories  developed  in 
other  disciplines  for  other  purposes  may  have  to  be  tried.  Models  may 
not  exist  to  explain  experimental  observations.  Theories  may  have  to 
encompass  discrete  and  continuum  behavior.  Convergent  algorithms  may 
not  exist,  and  errors  resulting  from  assumptions  or  data  sources  may 
not  be  estimatible. 
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7)  Record  of  past  difficulties 

This  factor  may  be  a  derived  quantity,  i.e.,  a  program  has  a  'record 
of  past  difficulties'  because  of  one  or  more  of  the  other  risk 
factors.  Much  of  this  factor  relates  to  the  absence  of  any  positive 
track  record  in  achieving  the  desired  research  objectives.  Many  past 
attempts  may  have  failed  (e.g. ,  Boron  propellant,  supersonic 
combustion)  .  The  approach  chosen  might  go  against  conventional  wisdom 
and  against  recent  trends  in  field.  The  approach  may  be  unproven  at 
any  level,  and  the  feasibility  of  the  new  approach  may  not  be 
obvious.  Only  a  qualitative  understanding  may  exist,  and  multiple 
approaches  may  be  required  due  to  uncertainty  of  operation  of  any 
single  approach.  While  different  parameter  values  may  have  been 
obtained  separately,  all  required  parameter  values  may  not  have  been 
attained  in  one  system. 

UTILIZATION  OF  RISK  LEVELS  IN  PROGRAM  ASSESSMENT  AND  SELECTION 

The  private  sector  uses  quantified  representations  of  risk 
extensively  in  its  decision  making  process.  As  an  example,  consider 
one  of  the  more  popular  evaluation  approaches  for  project  selection. 
Net  Present  Value  (NPV) .  This  process  discounts  all  of  a  project's 
estimated  costs  and  benefits  to  today's  dollars,  using  a  discount 
rate. 

One  approach  used  to  incorporate  total  risk  into  NPV  is  to 
increase  the  discount  rate  with  increasing  risk.  As  the  rate  (and 
risk)  increases,  short-term  payoffs  become  more  and  more  attractive. 
This  is  one  reason  that  basic  research,  with  its  attendant  high 
levels  of  risk  and  uncertainty  and  long  payoff  time  horizons,  has 
declined  under  industrial  sponsorship  for  the  last  few  decades.  In 
the  increasingly  competitive  global  environment,  companies  find  it 
increasingly  difficult  to  justify  funding  investments  with  low 
discounted  values  [Kostoff,  1996c]. 

Another  approach  to  incorporating  total  risk  into  NPV  is  to  keep 
the  discount  rate  at  some  low  risk  or  risk  free  level  (such  as  a  U. 
S.  Bond  rate),  and  incorporate  the  added  risk  factor  into  the 
expected  level  of  the  project's  payoff  [Kostoff,  1983].  Thus,  if  a 
project  is  estimated  to  have  a  50  percent  probability  of  achieving  a 
ten  million  dollar  payoff,  with  all  risk  factors  being  included,  then 
the  expected  value  of  its  payoff  is  five  million  dollars. 

This  expected  payoff  value  approach  to  incorporating  risk  into 
NPV  will  be  used  to  illustrate  how  a  project's  risk  can  be  .quantified 
utilizing  the  above  risk  taxonomy.  Assume  a  project  at  the  research 
proposal  stage  has  a  potential  ten  million  dollar  payoff.  For 
purposes  of  this  discussion,  neglect  the  non-technical  components  of 
risk.  Assume  the  total  technical  risk  can  be  divided  into  two 
components,  research  risk  and  technology  risk.  Research  risk  has 
been  defined  above.  Technology  risk  is  defined  here  as  the 
probability  that  the  commercial  system's  technical  performance 
objectives  can  be  attained  (assuming  the  research  cost/  performance/ 
schedule  objectives  have  been  attained  successfully)  at  a  specified 
cost  at  a  specified  time.  Thus,  the  total  risk  will  be  the  product 
of  the  research  and  technology  risks. 
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Assume  two  of  the  above  research  risk  categories  (#4  and  #2) 
contribute  to  the  research  risk.  If  these  categories  are  independent 
(e.g.,  a  very  high  precision  component  is  required  [#4],  and  it  will 
be  part  of  an  experiment  which  operates  in  a  very  hostile  environment 
[#2]),  then  the  research  risk  will  be  the  product  of  these  category 
probabilities.  Thus,  if  the  probability  of  success  in  achieving  the 
high  precision  component  is  one  out  of  two,  and  the  probability  of 
successful  operation  in  the  hostile  environment  is  one  out  of  two, 
then  the  research  probability  of  success  will  be  one  out  of  four.  If 
the  categories  are  not  fully  independent,  then  more  complex 
probability  analyses  are  required. 

To  compute  the  total  technical  risk  in  the  case  where  the 
categories  are  independent,  assume  there  is  a  one  in  four  chance  that 
the  research  objectives  can  be  attained  on  cost  and  schedule  (the 
result  of  the  previous  paragraph) ,  and,  if  these  objectives  are 
attained,  there  is  a  one  in  three  chance  that  the  technology 
objectives  can  be  attained  on  cost  and  schedule.  Then,  from  the 
vantage  point  of  the  research  proposal  stage,  there  is  a  one  in 
twelve  chance  that  the  potential  ten  million  dollar  payoff  will  be 
realized,  and  the  expected  value  of  the  payoff  in  this  case  is  833 
thousand  dollars. 

For  fundamental  research,  either  of  these  quantitative  economic 
approaches  has  to  be  employed  cautiously.  Risk  tends  to  be  high,  and 
the  uncertainty  in  the  projected  cost  and  benefit  streams  is  high,  so 
that  any  numerical  results  are  ve:^  uncertain  (Kostoff ,  1995a) .  For 
development  projects,  the  uncertainty  is  reduced,  and  the  results  of 
parametric  studies  are  substantially  more  credible. 
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Attachment  3  -  DESIRABLE  CHARACTERISTICS  OF  QUALITY  PEER  REVIEW 

The  desirable  characteristics  of  a  peer  review  can.be  summarized 
as  [Chubin,  1994]: 

1.  an  effective  resource  allocation  mechanism; 

2.  an  efficient  resource  allocator; 

3.  a  promoter  of  science  accountability; 

4.  a  mechanism  for  policymakers  to  direct  scientific  effort; 

5.  a  rational  process; 

6.  a  fair  process; 

7.  a  valid  and  reliable  measure  of  scientific  performance. 

High  quality  peer  reviews  require  as  a  minimum  the  conditions 
su^arized  from  Ormala  [1989]: 

1.  The  method,  organization  and  criteria  for  an  evaluation 
should  be  chosen  and  adjusted  to  the  particular  evaluation  situation; 

2.  Different  levels  of  evaluation  require  different  evaluation 
methods ; 

3.  Program  and  project  goals  are  an  important  consideration  when 
an  evaluation  study  is  carried  out; 

4.  The  basic  motive  behind  an  evaluation  and  the  relationships 
between  an  evaluation  and  decision  making  should  be  openly 
communicated  to  all  the  parties  involved; 

5.  The  aims  of  an  evaluation  should  be  explicitly  formulated; 

6.  The  credibility  of  an  evaluation  should  always  be  carefully 
established; 

7.  The  prerequisites  for  the  effective  utilization  of  evaluation 
results  should  be  taken  into  consideration  in  evaluation  design. 

Assuming  these  considerations  have  been  taken  into  account, 
three  of  the  most  important  intangible  factors  for  a  successful  peer 
review  are:  Motivation,  Competence,  and  Independence  [Kostoff, 
19941].  The  review  leader's  motivation  to  conduct  a  technically 
credible  review  is  the  cornerstone  of  a  successful  review.  The 
leader  selects  the  reviewers,  svimmarizes  their  comments,  guides  the 
questions  and  discussions  in  a  panel  review,  and  makes 
recommendations  about  whether  the  proposal  should  be  funded.  The 
quality  of  a  review  will  never  go  beyond  the  competence  of  the 
reviewers.  Two  dimensions  of  competence  which  should  be  considered 
for  a  research  review  are  the  individual  reviewer's  technical 
competence  for  the  subject  area,  and  the  competence  of  the  review 
group  as  a  body  to  cover  the  different  facets  of  research  issues 
(other  research  impacts,  technology  and  mission  considerations  and 
impacts,  infrastructure,  political  and  social  impacts) .  The  quality 
of  a  review  is  limited  by  the  biases  and  conflicts  of  the  reviewers. 
The  biases  and  conflicts  of  the  reviewers  selected  should  be  known  to 
the  leader  and  to  each  other. 
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Attachment  4  -  REVIEW  PROTOCOL  FOR  SUCCESSFUL  PEER  REVIEW 


This  section  contains  a  protocol  developed  by  the  author  for  the 
conduct  of  successful  peer  review  research  evaluations  and  impact 
assessments.  The  main  aims  of  the  protocol  are  to  ensure  that  the 
final  assessment  product  has  the  highest  intrinsic  quality  and  that 
the  assessment  process  and  product  are  perceived  as  having  the 
highest  possible  credibility.  The  protocol  elements  are: 

1.  The  objectives  of  the  assessment  must  be  stated  clearly  and 
unambiguously  at  the  initiation  of  the  assessment  by  the  highest 
levels  of  management,  and  the  full  support  of  top  management  must  be 
given  to  the  assessment.  In  turn,  the  objectives,  importance,  and 
urgency  of  the  assessment  must  be  airticulated  and  communicated  down 
the  management  hierarchy  to  the  research  managers  and  performers 
whose  research  is  to  be  assessed,  and  the  cooperation  of  these 
reviewees  in  the  conduct  of  the  assessment  must  be  enlisted  at  the 
earliest  stages  of  the  assessment; 

2.  The  final  assessment  product,  the  audience  for  the  product, 
and  the  use  to  be  made  of  the  product  by  the  audience  should  be 
considered  carefully  in  the  design  of  the  assessment; 

3 .  One  person  should  be  assigned  to  manage  the  assessment  at  the 
earliest  stage,  and  this  person  should  be  given  full  authority  and 
responsibility  for  the  assessment; 

4 .  The  assessment  manager  should  report  to  the  highest 
organizational  level  possible  in  order  to  ensure  maximum  independence 
from  the  research  units  being  assessed; 

5.  The  reviewers  should  be  selected  to  represent  a  wide  variety 
of  viewpoints,  in  order  to  address  the  many  different  facets  of 
research  and  its  impact.  The  reviewers  should  be  independent  of  the 
research  units  being  evaluated,  and  independent  of  the  assessing 
organization  where  possible.  The  objectives  of,  and  constraints  on 
(if  any) ,  the  assessment  should  be  communicated  to  the  reviewers  at 
the  initial  contact; 

6.  Maximum  background  material  describing  the  research  to  be 
assessed,  related  research  and  technology  development  sponsored  by 
external  organizations,  the  organization,  and  other  factors  pertinent 
to  the  assessment,  should  be  provided  to  the  reviewers  as  early  as 
possible  before  the  review.  This  will  allow  the  reviewers  and 
presenters  to  use  their  time  most  productively  during  the  review; 

7 .  Recommendations  resulting  from  the  assessment  , should  be 
tracked  to  ensure  that  they  are  considered  and  implemented,  where 
appropriate.  For  research  programs,  planning,  execution,  and  review 
are  linked  intimately.  Feedback  from  the  review  outcomes  to  planning 
for  the  next  cycle  should  be  tracked  to  ensure  that  the 
review/planning  coupling  is  operable. 
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Attachment  5  -  PROTOCOL  FOR  VERTICALLY  INTEGRATED  PROGRAM  RELEVANCE 
ASSESSMENT 

The  main  focus  of  a  vertically  integrated  program  assessment  is 
to  review  program  coordination  and  integration,  and  relevance  to 
agency  mission  requirements,  but  not  detailed  technical  quality.  The 
following  protocol  has  been  developed  to  focus  on  a  vertically 
integrated  program  relevance  assessment,  where  program  can  be 
interpreted  as  a  single  program  or  group  of  programs  centered  around 
a  vertically  structured  theme. 

The  protocol  from  Attachment  4  is  repeated  here  with  some 
additions.  These  additions  are  shown  in  capital  letters.  Some  of 
the  additions  apply  equally  well  to  quality  and  relevance 
assessments.  Those  additions  which  apply  mainly  to  vertically 
integrated  program  relevance  assessments  are  underlined  as  well. 

1)  Communication  of  Assessment  Objectives 

The  objectives  of  the  assessment  must  be  stated  clearly  and 
unambiguously  at  the  initiation  of  the  assessment  by  the  highest 
levels  of  management,  and  the  full  support  of  top  management  must  be 
given  to  the  assessment.  In  turn,  the  objectives,  importance,  and 
urgency  of  the  assessment  must  be  articulated  and  communicated  down 
the  management  hierarchy  to  the  research  managers  and  performers 
whose  research  is  to  be  assessed,  and  the  cooperation  of  these 
reviewees  in  the  conduct  of  the  assessment  must  be  enlisted  at  the 
earliest  stages  of  the  assessment; 

2)  End  Use  of  Assessment  Product 

The  final  assessment  product,  the  audience  for  the  product,  and 
the  use  to  be  made  of  the  product  by  the  audience  should  be 
considered  carefully  in  the  design  of  the  assessment; 

3)  Management  Structure  for  Assessment 

One  person  should  be  assigned  to  manage  the  assessment  at  the 
earliest  stage,  and  this  person  should  be  given  full  authority  and 
responsibility  for  the  assessment; 

The  assessment  manager  should  report  to  the  highest 
organizational  level  possible  in  order  to  ensure  maximum  independence 
from  the  research  units  being  assessed; 

4)  Criteria  for  Reviewer  Selection 

The  reviewers  should  be  selected  to  represent  a  wide  variety  of 
viewpoints,  in  order  to  address  the  many  different  facets  of  research 
and  its  impact.  The  reviewers  should  be  independent  of  the  research 
units  being  evaluated,  and  independent  of  the  assessing  organization 
where  possible.  The  objectives  of,  and  constraints  on  (if  any) ,  the 
assessment  should  be  communicated  to  the  reviewers  at  the  initial 
contact ; 

THE  TERMS  OF  REFERENCE  SHOULD  SPECIFY  WHAT  IS  DESIRED  FROM  THE 
REVIEWERS.  A  TIMELINE  OF  CRITICAL  ASSESSMENT  EVENTS  SHOULD  BE 
GENERATED  SHORTLY  AFTER  THE  TERMS  OF  REFERENCE  ARE  APPROVED. 
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5)  Guidance  to  Presenters 

PRECISE  GUIDANCE  SHOULD  BE  PROVIDED  TO  THE  PRESENTERS  AT  THE 
EARLIEST  STAGES  OF  THE  ASSESSMENT.  THIS  WILL  GIVE  THEM- ADEQUATE  TIME 
TO  GENERATE  A  COORDINATED  SET  OF  PRESENTATIONS  AND  TO  GIVE  SUFFICIENT 
*DRY  RUNS'  UNTIL  THEY  CONVERGE  TO  AN  ACCEPTABLE  PRODUCT.  SAMPLE 
GUIDANCE  f BOLDED)  FOR  A  HYPOTHETICAL  THREE  LEVEL  HIERARCHICAL 
PRESENTATION  STRUCTURE  FOLLOWS. 


THE  PURPOSE  OF  THE  REVIEW  WILL  BE  FIVEFOLD!  ASSESS  TECHNICAL 
QUALITY.  FOCUS.  SCOPE.  AND  BALANCE  OF  AGENCY  S&T  PROGRAMS  IN  SUPPORT 
OF  MISSION  REQUIREMENTS;  ASSESS  THE  RESPONSIVENESS  AND  COMPLETENESS 
OF  THESE  PROGRAMS  IN  RELATION  TO  EXISTING  AND  ANTICIPATED  AGENCY 
MISSION  NEEDS r  ASSESS  HOW  THESE  PROGRAMS  ARE  COORDINATED  AND 
INTEGRATED.  BOTH  HORIZONTALLY  AND  VERTICALLY?  IDENTIFY  DUPLICATION 
AND/OR  GAPS  IN  THESE  PROGRAMS  IN  RELATION  TO  AGENCY  MISSION 
REQUIREMENTS;  ASSESS  LEVERAGING  BY  AGENCY  OF  OTHER  PROGRAMS.  BOTH 
INTERNAL  AND  EXTERNAL . 


THERE 

WILL 

BE  THREE  LEVELS  OP  SPEAKERS  MAKING  THE  TECHNICAL 

PROGRAM  PRESENTATION.  THE  LEVEL  1  SPEAKER.  WHO  MANAGES  THE  VERTICAL 

STRUCTURE . 

WILL 

PRESENT  THE  VERTICAL  STRUCTURE  OVERVIEW.  THE 

REMAINDER  OP  THIS  GUIDANCE  PERTAINS  TO  THE  SPEAKERS  AT  THE  NEXT  TWO 

LEVELS.  THE  LEVEL  2  SPEAKERS.  WHO  ARE  SUB-MANAGERS  OP  THE  VERTICAL 

STRUCTURE . 

WILL 

PRESENT  OVERVIEWS  OP  THE  MAIN  VERTICAL  STRUCTURE 

COMPONENTS . 

IN 

THE  AGENDA.  A  NUMBER  OP  SUB-AREAS  WITH  IDENTIFIED 

SPEAKERS  ARE  LISTED  UNDER  EACH  OP  THE  MAIN  AREAS.  THE  SPEAKERS 

LISTED  FOR 

THESE 

SUB-AREAS  ARE  IDENTIFIED  AS  THE  LEVEL  3  SPEAKERS. 

EACH  OP  THE 

LEVEL  2  SPEAKERS  SHOULD  MEET  WITH  THE  LEVEL  3  SPEAKERS  AS 

SOON  AS  IS 

POSSIBLE.  TO  COORDINATE  THE  PRESENTATIONS.  IT  IS  EXPECTED 

THAT  THE  LEVEL  3 

SPEAKERS  WILL  ADDRESS  A  COMBINATION  OP  SCIENCE  AND 

TECHNOLOGY  PROGR2^S  IN  AN  INTEGRATED  MANNER. 


THE  LEVEL  2  SPEAKERS  WILL  BEGIN  THEIR  PRESENTATION  BY  RELATING 
THEIR  OBJECTIVES  TO  THOSE  IDENTIFIED  IN  THE  LEVEL  1  SPEAKER'S 
OVERVIEW  PRESENTATION.  THEN.  THE  LEVEL  2  SPEAKERS  SHOULD  PRESENT  ONE 
OR  MORE  VIEWGRAPHS  IN  EACH  OF  THE  FOLLOWING  TOPICS.  THEY  SHOULD 
DESCRIBE  THE  INVESTMENT  STRATEGY  AMONG  THE  DIFFERENT  SUB-AREAS.  THE 
INVESTMENT  STRATEGY  INCLUDES  THE  DOLLARS  ALLOCATED  TO  EACH  OF  THE 
SUB- AREAS.  AS  WELL  AS  A  RATIONALE  FOR  THE  ALLOCATION.  THE  DOLLARS 
ALLOCATED  SHOULD  BE  PRESENTED  IN  TABULAR  FORM  FOR  FY9X  -  FY9Y.  WITH 
FUNDS  FOR  THE  SCIENCE  AND  TECHNOLOGY  CATEGORIES  BROKEN  OUT.  INCLUDED 
IN  THE  INVESTMENT  STRATEGY  WILL  BE  A  DESCRIPTION  OF  HOW  THE  AGENCY 
PROGRAM  RELATES  TO  OTHER  AGENCY.  INDUSTRY.  AND  OTHER  COUNTRY  PROGRAMS 
AND  PROJECTS.  NEXT.  THE  BROAD  S&T  OBJECTIVES  OF  THE  DIFFERENT 
SUB-AREAS  WILL  BE  IDENTIFIED.  THE 

PURPOSE  IS  TO  SET  THE  STAGE  FOR  THE  MORE  DETAILED  PRESENTATIONS 
OF  THE  SUB-AREA  SPEAKERS  WHICH  ARE  TO  FOLLOW.  THE  THIRD  TOPIC  IS 
THE  COORDINATION  AND  INTEGRATION  AMONG  THE  DIFFERENT  SUB-AREAS. 

THE  FOURTH  TOPIC  IS  THE  AGENCY  MISSION  OBJECTIVES  AND  RELEVANCE  OF 
THE  DIFFERENT  SUB-AREAS.  THE  FINAL  TOPIC  RELATES  TO  THE  GAPS  AMONG 
THE  AREAS  WHICH  ARE  NOT  ADDRESSED  BY  THE  INVESTMENT  STRATEGY. 

ALL  OF  THE  ABOVE  CRITERIA  SHOULD  BE  AT  A  DESCRIPTIVE  LEVEL 
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COMMENSURATE  WITH  THE  SUB-AREA  RESOLUTION.  MORE  DETAILED 
DESCRIPTIONS  WILL  BE  PRESENTED  BY  THE  SUB-AREA  SPEAKERS. 

THE  LEVEL  3  SPEAKERS.  EACH  OF  WHOM  WILL  COVER  ONE  SUB-AREA. 
WILL  HAVE  GUIDANCE  SIMILAR  IN  FORM  TO  THAT  OF  THE  LEVEL  2 
SPEAKERS.  THE  MAJOR  DIFFERENCE  WILL  BE  IN  THE  LEVEL  OF 
RESOLUTION  OF  DETAIL  WHICH  THE  LEVEL  3  SPEAKERS  WILL  ADDRESS. 

THE  LEVEL  3  SPEAKERS  SHOULD  BEGIN  THEIR  PRESENTATION  BY  RELATING 
THEIR  OBJECTIVES  TO  THOSE  COVERED  IN  THE  LEVEL  2  SPEAKERS' 
PRESENTATION.  THEN.  THE  LEVEL  3  SPEAKERS  SHOULD  PRESENT  THE 
INVESTMENT  STRATEGY  FOR  THEIR  SUB-AREAS.  THIS  INVESTMENT 
STRATEGY  INCLUDES  THE  ALLOCATION  OF  FUNDING  WITHIN  THE  SUB-AREA. 

AND  THE  RATIONALE  FOR  THIS  ALLOCATION.  THE  DOLLARS  ALLOCATED 
SHOULD  BE  PRESENTED  IN  TABULAR  FORM  FOR  FY9X  -  FY9Y.  WITH  FUNDS 
FOR  SCIENCE  AND  TECHNOLOGY  CATEGORIES  BROKEN  OUT.  IT  SHOULD  PROVIDE 
SOME  UNDERSTj^DING  OF  THE  PRIORITIZATION  OF  TASKS  AND  PROGRAMS  WITHIN 
THE  SUB-AREA.  AS  PART  OF  THE  INVESTMENT  STRATEGY.  INTEGRATION  WITH 
OTHER  AGENCY.  INDUSTRY.  AND  COUNTRY  PROGRAMS  AND 
PROJECTS  SHOULD  BE  IDENTIFIED.  THEN.  THE  S&T  OBJECTIVES  OF  THE 
AREAS  DISCUSSED  SHOULD  BE  PRESENTED.  THE  COORDINATION  AND 
INTEGRATION  AMONG  THE  DIFFERENT  PROGRAMS  SHOULD  BE  ADDRESSED. 

AGENCY  MISSION  OBJECTIVES  7^D  RELEVANCE  OF  THE  DIFFERENT  PROGRAMS 
SHOULD  BE  IDENTIFIED.  THE  S&T  GAPS  WHICH  WERE  NOT  ADDRESSED  BY  THE 
INVESTMENT  STRATEGY  SHOULD  BE  DISCUSSED  AND  IDENTIFIED. 

ONE  OF  THE  KEY  CHALLENGES  IN  THE  PRESENTATIONS  WILL  BE  TO 
SHOW  THE  CONNECTIVITY  BETWEEN  THE  MORE  FOCUSED  TECHNOLOGY  AREAS 
AND  THE  MORE  FUNDAMENTAL  GENERIC  RESEARCH  AND  TECHNOLOGY  AREAS. 

TO  DISPLAY  BOTH  VERTICAL  AND  HORIZONTAL  RESEARCH/TECHNOLOGY 
CONNECTIVITY.  THE  SUPPORTING  SCIENCE  AND  TECHNOLOGY  SPEAKERS  WILL  BE 
CROSS-REFERENCED  DURING  FOCUSED  AREA  PRESENTATIONS,  AND  WILL 
CROSS-REFERENCE  EACH  OTHER  AS  NECESSARY  DURING  SUPPORTING  SCIENCE  AND 
TECHNOLOGY  PRESENTATIONS . 

6)  Importance  of  Dry  Runs 

THE  DRY  RUNS  SHOULD  BE  ITERATED  UNTIL  THEY  CONVERGE  TO  AN 
ACCEPTABLE  PRODUCT. 

TO  FOSTER  BETTER  COORDINATION  AND  CROSS-REFERENCING  AMONG  THE 
SPEAKERS  DURING  VUGRAPH  PREPARATION.  ROUGH  DRAFTS  OF  EACH  DRY  RUN 
SHOULD  BE  CIRCULATED  TO  ALL  SPEAKERS  A  FEW  WEEKS  BEFORE  DRY  RUNS. 

TO  MINIMIZE  EXCESSIVE  TIME  AT  DRY  RUNS.  THE  INTERMEDIATE 
OVERVIEW  SPEAKERS  SHOULD  DRY  RUN  THE  FOCUSSED  AREA  AND  FUTURE  OPTIONS 
SPEAKERS  IN  THEIR  PURVIEW  UNTIL  SATISFIED.  THEN  THERE  SHOULD  BE  FUT.T. 
SPEAKER  ATTENDANCE  AT  THE  FINAL  DRY  RUNS  FOR  THE  ASSESSMENT  MANAGER. 
WELL  BEFORE  THE  FULL  ATTENDANCE  DRY  RUNS .  THE  OVERVIEW  SPEAKERS 
SHOULD  PRESENT  "ESSENTIALLY  FINAL”  DRY  RUNS  FOR  THE  ASSESSMENT 
MANAGER. 


7 )  Background  Material 

Maximum  background  material  describing  the  research  to  be 
assessed,  related  research  and  technology  development  sponsored  by 
external  organizations,  the  organization,  and  other  factors  pertinent 
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to  the  assessment,  should  be  provided  to  the  reviewers  as  early  as 
possible  before  the  review.  The  will  allow  the  reviewers  and 
presenters  to  use  their  time  most  productively  during. the  review  BY 
SUBSTITUTING  DIALOGUE  FOR  MONOLOGUE.  IT  COULD  POTENTIALLY  REDUCE  THE 
TOTAL  REVIEW  TIME  AS  WELL.  FOR  A  VERTICALLY  INTEGRATED  STRUCTURE 
WHICH  ENCOMPASSES  A  LARGE  NUMBER  OF  PROGRAMS,  REVIEW  TIME  BECOMES  AN 
IMPORTANT  CONSIDERATION  FOR  RETAINING  THE  AUDIENCE  AND  FOR 
AVAILABILITY  OF  HIGH  QUALITY  REVIEWERS.  PROVISION  OF  THIS  BACKGROUND 
MATERIAL  ON  FLOPPY  DISKS,  INCLUDING  VUGRAPHS,  SHOULD  BE  CONSIDERED  TO 
REDUCE  PAPER  FLOW. 

IN  PARTICULAR.  FOR  EACH  PROGRAM  COVERED  DURING  THE  ASSESSMENT. 
A  WRITTEN  SUMMARY  SHOULD  BE  PROVIDED  WHICH  INCLUDES; 

NAME.  PHONE  NUMBER.  AND  ORGANIZATION  OF  THE  PROGRAM  MANAGER. 

TITLE  OF  PROGRAM 

PROGRAM  FUNDING  FOR  Z  YEARS  BY  CATEGORY 

DESCRIPTION  OF  TECHNICAL  OBJECTIVES.  CAPABILITY  IMPROVEMENTS 
EXPECTED  IF  SUCCESSFUL.  AND  POTENTIAL  PAYOFF  TO  AGENCY  MISSION. 

8)  Final  Presentations 

THE  PRESENTATIONS  TO  THE  PANEL  SHOULD  BE  DEVELOPED  IN  A 
SIMULTANEOUS  ORDERLY  »TOP-DOWN**  AND  "BOTTOM-UP”  PROCESS.  THE  TOP 
LEVEL  OVERVIEW  SHOULD  BE  DEVELOPED  FIRST.  THEN  THE  NEXT  LEVEL 
OVERVIEWS.  AND  FINALLY  ITERATED  WITH  THE  FOCUSSED  PRESENTATIONS. 

IN  DEVELOPING  THE  OVERVIEW  INVESTMENT  STRATEGIES  fAND  THE 
FOCUSSED  AREA  STRATEGIES  AS  WELL^ .  THE  FOLLOWING  SEQUENCE  SHOULD  BE 
USED.  A  PRIORITIZED  SET  OF  MISSION  REQUIREMENTS  SHOULD  BE  GENERATED. 
THEN  TRANSLATED  INTO  A  SET  OF  PRIORITIZED  S&T  REQUIREMENTS.  THE 
EXISTING  AND  PLANNED  S&T  PROGRAM  WHICH  IS  DERIVED  FROM  THESE 
REQUIREMENTS  SHOULD  THEN  BE  IDENTIFIED.  ITS  MAIN  COMPONENTS  SHOULD 
BE  PRIORITIZED.  THE  FUNDING  ALLOCATION  AMONG  THESE  COMPONENTS.  AND 
THE  RATIONALE  FOR  THIS  ALLOCATION  WHICH  SUPPORTS  THE  PRIORITIZATION. 
SHOULD  BE  PROVIDED.  OTHER  AGENCY  AND  INDUSTRY  FUNDING  SHOULD  BE 
TAKEN  INTO  ACCOUNT  TO  SHOW  THE  IMPACT  OF  LEVERAGING  ON  THE  INVESTMENT 
STRATEGY . 

ESTIMATES  OF  CAPABILITIES  (QUANTITATIVE  IF  POSSIBLE)  WHICH  COULD 
RESULT  FROM  SUCCESSFUL  TECHNOLOGY  DEVELOPMENT  SHOULD  BE  SPECIFIED. 
ALONG  WITH  ESTIMATES  OF  PRESENT  CAPABILITIES.  WITH  SOME  DESCRIPTION 
OF  HOW  THESE  PROJECTED  CAPABILITY  ESTIMATES  WERE  OBTAINED. 

FOR  VERTICALLY  INTEGRATED  PROGRAMS .  THE  PRESENTATION  SHOULD 
ALLOW  THE  PANEL  TO  IDENTIFY  WHETHER  THE  DIFFERENT  R&D  CATEGORIES  ARE 
UNDER  COMMON  OR  DIFFERENT  MANAGEMENT.  IN  ORDER  TO  DISTINGUISH  MORE 
EASILY  BETWEEN  VERTICAL  COORDINATION  AND  VERTICAL  INTEGRATION. 
BACKGROUND  MATERIAL  COULD  SUPPLY  MUCH  OF  THIS  INFORMATION. 

PRESENTATIONS  SHOULD  FOCUS  MORE  ON  THE  COORDINATED  AND 
INTEGRATED  VERTICAL  STRUCTURES  THAN  SPECIFIC  R&D  FUNDING  CATEGORIES. 


CROSS-REFERENCE  THE  SUPPORTING  S&T  SPEAKERS.  AND  VICE  VERSA 
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THE  PRESENTATIONS  SHOULD  BK  SHORT  AND  FQCTTS  ON  KEY  ISSUES  AND 
THRUSTS.  AN  UPPER-LIMIT  PERIOD  OF  AN  HOUR.  INCLUDING  TIME  FOR  ANY 
QUESTIONS.  FOR  ANY  PRESENTATION  SHOULD  BE  ESTABLISHED^, 

FOR  RELEVANCE  ORIENTED  REVIEWS.  THERE  NEEDS  TO  BE  A  CLEAN  BREAK 
WITH  THE  TRADITIONAL  PROGRAM  REVIEWS.  WITH  THEIR  EMPHASIS  ON 
MONOLOGUE  PRESENTATIONS  OF  DETAILED  TECHNICAL  MATERIAL.  IF 
APPROPRIATE  BACKGROUND  MATERIAL  IS  SUPPLIED.  INCLUDING  VUGRAPHS ,  AND 
ONLY  ESSENTIAL  VUGRAPHS  ARE  PRESENTED  DURING  THE  REVIEW.  IT  MAY  BE 
POSSIBLE  TO  REDUCE  THE  LENGTH  OF  THE  REVIEW  BY  DAYS.  THIS  WOULD  AID 
IN  EXPANDING  THE  POOL  OF  POTENTIAL  REVIEWERS. 

THE  AGENCY  COULD  CONSIDER  HOLDING  THE  RELEVANCE  REVIEWS  IN  SOME 
RETREAT  SETTING  FOR  THREE  OR  FOUR  DAYS.  WITH  SUFFICIENT  BACKGROUND 
MATERIAL  SENT  TO  THE  REVIEWERS.  THE  FOCUS  OF  THE  REVIEWS  WOULD  BE 
ALMOST  ENTIRELY  ON  DIALOGUE.  PARTICIPANTS  COULD  BREAK  OFF  INTO  SMALL 
GROUPS  AFTER  THE  OFFICIAL  DISCUSSIONS  AND  CONTINUE  DIALOGUE  IN  A  MORE 
INFORMAL  MANNER  TO  CLEAR  UP  ANY  OUTSTANDING  ISSUES  AND  QUESTIONS. 

VUGRAPHS  WITH  STANDARDIZED  CONTENT  SHOULD  BE  REQUIRED  FROM  THE 
PRESENTERS.  CONTAINING  THE  FOLLOWING  CRITERIA; 

PRIORITIZED  AGENCY  MISSION  REQUIREMENTS  FOR  SPECIFIC  SUB-AREA  TO 
BE  PRESENTED. 

PRIORITIZED  S&T  REQUIREMENTS. 

INVESTMENT  STRATEGY  FOR  MAIN  SUB-AREA  THRUSTS.  INCLUDING  THE 
FUNDING  AND  THE  RATIONALE  FOR  EXISTING  AND  PLANNED  PROGRAMS. 

COORDINATION  WITH  OTHER  RELATED  PROGRAMS. 

BROAD  S&T  OBJECTIVES  FOR  THE  SUB-AREA. 

S&T  GAPS  BASED  ON  REQUIREMENTS. 

S&T  OPPORTUNITIES. 

POTENTIAL  MULTIPLE  APPLICATION  PAYOFFS. 

THE  HIGHEST  LEVEL  OVERVIEW  SHOULD  BE  THE  MOST  COMPREHENSIVE 
PRESENTATION  IN  IDENTIFYING  SOURCES  OF  REQUIREMENTS. 

9)  Integrating  Assessment  Recommendations  into  Planning  Cycle 

Recommendations  resulting  from  the  assessment  should  be  tracked 
to  ensure  that  they  are  considered  and  implemented,  where 
appropriate.  For  research  programs,  planning,  execution,  and  review 
are  linked  intimately.  Feedback  from  the  review  outcomes  to  planning 
for  the  next  cycle  should  be  tracked  to  ensure  that  the 
review/planning  coupling  is  operable. 
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Attachment  6  -  REVIEW  PANEL  SELECTION  APPROACHES 

A  review  panel  should  have  at  least  the  following 
characteristics ; 

1.  Each  member  should  be  highly  competent  in  the  facet  of  the 
program  for  which  he  has  been  selected 

2.  The  panel  as  a  body  should  have  sufficient  competence  to 
cover  all  major  facets  of  the  program  being  reviewed 

3.  Each  member  should  be  minimally  conflicted  with  the  program 
under  review,  and  any  conflicts  or  biases  should  be  known  to  all  the 
panel  members  before  the  review 

4.  Each  member  should  agree  to  read  all  background  material, 
attend  all  sessions,  and  protect  any  classified  and  proprietary 
information  which  arises  during  the  review 

Selection  of  an  optimal  review  panel  is  more  of  an  art  than  a 
science  at  present,  and  depends  on  the  selector's  understanding  of 
the  program  being  reviewed,  on  her  understanding  of  the  experts 
available  in  the  technical  community,  and  on  her  ability  to  predict 
the  interaction  dynamics  of  a  particular  group  of  experts. 
Presently,  different  Federal  agency  approaches  in  panel  selection 
range  from  assembling  program  manager  recommendations  to  using  an 
iterative  co-nomination  approach.  Since  the  latter  approach, 
properly  done,  is  relatively  objective  to  the  program  being  reviewed, 
the  remainder  of  this  attachment  will  focus  on  its  description. 

In  essence,  the  iterative  co-nomination  approach  is  a  multi-step 
process  which  starts  with  an  input  list  of  recommended  experts  and 
converges  to  a  list  of  experts  who  have  been  multiply  nominated  by 
different  experts.  The  first  step  is  to  define  what  specifically  are 
the  technical  areas  to  be  reviewed,  and  what  is  the  objective  and 
expected  output  of  the  review.  Once  the  overall  technical 
description  of  the  program  is  generated,  and  technical  descriptions 
of  the  subdisciplines  are  provided,  reviewer  identification  can  be 
initiated. 

Sources  of  candidate  reviewers  can  include  program  manager 
recommendations,  membership  lists  of  prestigous  organizations  such  as 
the  National  Academies,  agency  review  boards,  agency  consultant 
pools,  and  other  similar  lists.  (One  of  the  real  deficiencies  in 
present  day  pools  of  reviewer  candidates  is  the  absence  of  a 
centralized  updated  pool  of  experts  which  spans  the  Federal  agencies. 
With  present  computer  capabilities,  a  centralized  list  which  includes 
neune,  organization,  biography,  areas  of  expertise,  previous  panels 
and  panel  references  for  thousands  of  experts,  and  is  easily 
accessible  to  assessment  memagers,  would  be  simple  to  construct.  It 
could  be  updated  continuously  with  input  from  program  managers  as 
they  become  acquainted  with  new  experts.  Such  a  pool  should  be 
instituted  immediately  after  multi-agency  agreement.).  Multiple 
names  are  chosen  to  cover  each  sub-discipline,  the  program  as  a 
whole,  allied  research  disciplines,  the  technologies,  systems,  and 
operations  which  the  program  could  potentially  impact,  and  other 
elements  of  the  customer,  stakeholder,  user,  and  impactee 
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communities.  This  list  of  names  is  called  level  1,  or  the  initial 
list. 

Each  member  of  level  1  is  asked  to  identify,  or  nominate,  other 
experts  in  his  particular  area  of  expertise  for  the  level  2  list. 
For  example,  assume  that  a  Physics  program  is  being  assessed.  Assvime 
further  that  this  program  has  three  siabdisciplines:  plasma  physics, 
atomic  physics,  and  molecular  physics.  The  level  1  list  may  have  two 
names  for  each  of  the  svibdisciplines.  To  obtain  the  level  2  list  for 
the  plasma  physics  research  area  of  expertise,  each  of  the  two  plasma 
physics  recommendees  of  level  1  would  be  asked  to  recommend  two 
experts  in  plasma  physics.  If  names  appear  more  than  once  in  the 
level  2  list,  or  between  the  level  1  and  level  2  lists  (multiply 
recommended  individuals)  ,  then  these  people  are  assumed  to  be  the 
leading  experts  in  the  fields  to  be  assessed.  If  no  multiple 
recommendations  appear,  then  the  experts  in  level  2  are  asked  to 
recommend  two  experts  in  plasma  physics  for  level  3,  and  the  co¬ 
nomination  search  is  repeated.  Convergence  occurs  when  an  adequate 
number  of  experts  have  been  co-nominated .  While  this  process  may  at 
first  seem  complex  and  open-ended,  convergence  is  rapid  because  of 
the  relatively  small  number  of  real  experts  in  any  well-defined 
technical  discipline. 

A  primary  and  alternate  list  of  co-nominees  should  be  matrixed 
against  selection  requirements  and  criteria  as  shown  below,  where  the 
matrix  elements  represent  the  reviewer's  expertise  in  the  different 
facets  being  examined.  This  matrix  should  be  distributed  to  the 
program  managers  and  performers  who  will  be  reviewed,  and  comments 
related  to  bias  and  conflict  solicited.  If  strong  objections  can  be 
supported,  the  list  could  be  modified. 

REVIEWER/  CRITERIA  MATRIX 


SUB- 

•  SUB- 

■  SUB- 

TOTL 
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PRI. 
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EXPT 

EXP 

ALT 

NAME.l.(ORl)  10.. 

7.  .  .  . 

6  •  •  •  • 

•  8  •  •  • 

.8.  . 

•  5  •  •  • 

,3.  . 

.PRI. 

NAME. 2. (OR2) .9. . 
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.9.  .  . 

.9.  . 
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10.  .  . 

.7.  .  . 

,7.  . 
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.5.  . 
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3  .  .  .  . 

.4.  .  . 

,4.  . 

,10.  . 

.8.  . 

.PRI 

NAME. 5. (0R5) .2. . 

2.  .  .  . 

3  .  .  .  . 

,3.  .  . 

.3.  . 

>  8  •  •  • 

,10. 
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NAME. 6. (0R6) .7. . 

8.  .  .  . 

7  .  .  .  . 
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Attachment  7  -  ASSESSMENT  ISSUES  FOR  PRESENTATIONS 

CRITERIA  FOR  AGENCY  REVIEWS 

1.  Quality  and  uniqueness  of  the  work 

2.  Scientific  and  technological  opportunities  in  areas  of  likely 
agency  mission  importance 

3.  Need  to  establish  a  balance  between  revolutionary  and  evolutionary 
work 

4.  Position  of  the  work  relative  to  the  forefront  of  other  efforts 

5.  Responsiveness  to  present  and  future  agency  mission  requirements 

6.  Possibilities  of  follow-on  programs  in  higher  R&D  categories 

7.  Appropriateness  of  the  efforts  for  agency  vice  other  agencies 

8.  A  reliance  (other  agency  coordination)  connection  of  the  work 


QUESTIONS  TO  BE  ASKED  OF  AGENCY  PROGRAMS 


1.  What  are  we  trying  to  do  (in  a  systems  concept)? 

2.  Can  specific  advantage  to  the  agency  be  identified  if  program  is 
successful? 

3.  How  is  the  system  done  today  and  what  are  the  limitations  of  the 
current  practice? 

4.  Would  the  work  be  supported  if  it  were  not  already  underway? 

5.  Assuming  success,  what  difference  does  it  make  to  the  user  in  a 
mission  area  content? 

6.  What  is  the  technical  content  of  the  program  and  how  does  it  fit 
with  other  ongoing  efforts  in  academia,  industry,  agency  labs,  DoE 
labs,  etc.? 

7.  What  are  the  decision  milestones  of  the  program? 

8.  How  long  will  the  program  take;  how  much  will  the  program  cost; 
what  are  the  mid-term  and  final  objectives  of  the  program? 
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Attachment  8  -  RESEARCH  PRODUCT  EVOLUTION  TRACKING  DATABASE 

As  stated  previously,  central  to  credible  work  in  predicting  and 
tracking  the  diffusion  of  information  from  research  is  a  database  of 
research  products  at  various  evolutionary  stages  which  can  feed  the 
predictive  models.  This  database  of  research  products  could  be 
linked  in  part  with  the  above-proposed  database  of  research  and 
technology.  Since  the  research  product  evolutionary  pathways 
transcend  the  research  originating  organization,  and  can  intersect 
all  societal  sectors,  the  cooperation  of  many  public  and  private 
organizations  would  be  required  to  develop  a  database  of  research 
products  in  their  evolutionary  stages.  Development  and  construction 
of  such  a  database  should  start  in  the  near  future. 

One  approach  to  constructing  this  research  product  evolution 
database  has  its  conceptual  heritage  in  Kostoff  [1994i].  The 
products  of  research  and  technology  development  programs  would  be 
entered  into  a  database  on  a  periodic  basis.  The  research  and 
technology  product  antecedents  which  led  to  these  latest  products 
would  be  identified.  Linkages  would  be  constructed  to  show  the 
evolution  of  the  research  products  over  time,  with  appropriate  credit 
given  to  the  programs  which  spawned  the  initial  research  products. 

As  a  particular  example  of  an  entry  in  the  proposed  database, 
assume  that  research  program  PI  has  a  nximber  of  products.  These 
products  could  include  papers,  patents,  reports,  presentations, 
graduate  students ,  etc .  The  various  products  would  be  entered  into 
the  database,  and  their  ties  with  PI  and  its  input  characteristics 
(evaluation  scores,  funding,  etc.)  would  be  retained.  These  products 
would  be  related  to  their  antecedents,  and  these  antecedents  would  be 
part  of  the  database  after  the  initial  transient  start-up  period. 
For  example,  a  paper  which  resulted  from  PI  would  be  linked  through 
its  references  to  research  and  development  products  (other  papers, 
patents,  presentations,  etc.)  resulting  from  other  programs.  A 
patent  resulting  from  PI  would  have  similar  linkages.  For  those 
products  whose  antecedent  research  and  development  products  cannot  be 
traced  as  easily  as  papers  or  patents  (such  as  devices  that  are 
developed  and  not  published  in  the  literature) ,  the  program  manager 
of  PI  would  enter  the  product  and  its  antecedents  in  the  database. 
In  technology  development  and  engineering  development  programs,  there 
tends  to  be  less  of  a  readily  available  documentary  trail  of  program 
products,  and  the  program  manager  of  these  types  of  programs  would 
have  to  supply  more  of  the  product  and  antecedent  information  than 
the  nominal  research  program  manager. 

Included  with  the  entry  of  an  antecedent  in  the  database  would 
be  some  measure  of  its  relative  importance  in  the  generation  of  the 
research  product  resulting  from  PI.  Thus,  for  a  patent  which 
resulted  from  PI  and  referred  to  five  papers,  some  measure  of 
importance  of  the  impact  of  each  of  the  five  papers  on  the  successful 
development  of  the  patent  of  interest  should  be  provided  by  the 
program  manager.  If  providing  an  importance  measure  proves  to  be 
infeasible  in  practice  because  of  sheer  data  volume  limitations,  then 
all  antecedents  could  be  assumed  to  have  equal  importance.  Provision 
of  an  importance  measure  should  not  be  ruled  out  at  present,  since 
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visionary  approaches  could  conceivably  overcome  this  problem. 

Thus,  in  its  steady  state  operation  mode,  the  database  would 
consist  of  large  amounts  of  research  and  technology  development 
products  with  quantitative  measures  of  the  strength  of  their 
linkages.  If  it  were  desired  to  examine  the  multiple  impacts  of  a 
given  research  (or  technology)  program  on  downstream  'products',  then 
the  total  output  of  the  program  could  be  integrated  forward  in  time 
over  the  linkages.  The  downstream  impact  could  then  be  related  to 
the  program  inputs  (evaluation  scores,  funding,  etc.)  to  arrive  at 
the  desired  information.  Programs  with  little  downstream  impact 
would  be  identified  as  well  as  those  with  high  downstream  impact.  If 
it  were  desired  to  start  with  a  given  downstream  impact  (say,  a 
successfully  developed  system)  and  identify  those  research  and 
development  programs  which  contributed  to  successful  development  of 
the  system  (as  well  as  the  strength  of  their  contribution)  this  could 
be  done  as  well.  The  integration  would  be  performed  backwards  in 
time  over  the  linkages  to  arrive  at  the  various  research  and 
technology  development  products  which  spawned  the  successful  impact, 
and  the  research  and  technology  development  programs  could  then  be 
identified  from  their  products. 

There  may  be  other  valid  approaches  to  developing  such  a  product 
tracking  database,  and  at  this  early  conceptual  stage,  all  approaches 
should  be  considered.  The  most  important  factor  is  for  government 
and  private  organizations  to  start  serious  planning  of  this  database 
in  the  near  future. 
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Attachment  9  -  SAMPLE  GUIDANCE  FOR  QUALITY/RELEVANCE  PROGRAM  REVIEW 
9A.  EXPANDED  VERSION  OF  GUIDANCE 

1.  In  the  winter  of  CY95,  the  agency  will  conduct  its  annual  series 
of  Department  Reviews  at  the  program  level.  To  introduce  these 
reviews,  each  Department  Director  will  provide  an  overview  of  that 
Department's  total  program,  followed  by  a  more  detailed  review  of  a 
portion  (at  least  one  third)  of  that  Department's  program.  The 
purpose  of  the  more  detailed  review  of  a  portion  of  the  program  is  to 
provide  the  means  for  examining  the  technical  quality  of  the  agency's 
research  program.  The  intended  audience  includes:  management, 
related  technical-matter  experts  and  integration  partners  from  the 
agency;  customer,  stakeholder,  user,  and  impactee  personnel;  and  the 
external  Science  and  Technology  (S&T)  communities.  Specific 
objectives  of  the  annual  reviews  include: 

a.  To  identify  the  research  investment,  technical  merit,  and 
accomplishments  of  the  science  supported  by  the  agency. 

b.  To  assess  the  technical  quality  of  and  eff activity  of  science 
integration  in  the  agency ' s  programs  in  the  context  of  the  agency ' s 
investment  plan. 

c.  To  identify  research  program  gaps  and  opportunities. 

The  FY95  reviews  will  give  the  agency's  senior  managers  an 
opportunity  to  review  the  agency's  research  programs  and  to  discuss 
program  integration. 

2 .  Department  Heads  should  select  and  organize  the  more  detailed 
review  of  the  portion  of  their  program  into  topical  areas  appropriate 
for  achieving  the  stated  objectives.  One  day  should  be  allocated  to 
each  topical  area,  including  an  executive  session  at  the  end  of  each 
day  as  described  below. 

3.  A  panel  of  external  reviewers  (ER)  will  be  used  to  help  evaluate 
the  research  merit  and  relevance  of  programs  in  each  topical  area. 
In  view  of  the  diversity  of  topics  which  may  be  covered,  ER 
membership  may  vary  from  day  to  day.  ER  tasking  is  provided  in 
enclosure  (1)  .  ER  members  should  include  bench-level  world-class 
technical  experts,  generalists  with  appropriate  expertise  to  provide 
an  independent  assessment  of  research  quality,  and  knowledgeable 
representatives  from  the  agency's  customer,  stakeholder^  and  user 
community.  Performers  in  the  programs  being  reviewed  will  not  be 
eligible,  nor  will  anyone  who  has  a  financial  or  related  conflict 
with  the  program  being  reviewed. 

4.  ER  candidates  should  be  svibmitted  to  the  Director  of  Assessment. 
The  submittal  package  should  include  a  technical  description  of  each 
sub-discipline  to  be  reviewed,  the  candidates'  names,  organizations, 
biographies,  sources  of  candidates  names,  and  a  matrix  showing  how 
the  candidates'  expertises  relate  to  the  areas  being  reviewed.  The 
Director  of  Assessment  will  use  the  submitted  names,  as  well  as  names 
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obtained  from  lists  of  members  of  the  National  Academy  of  Sciences, 
National  Academy  of  Engineering,  Institute  of  Medicine,  Federal 
agency  advisory  board  members  and  consultants,  and  other  sources  of 
experts,  as  a  starting  point  in  the  iterative  co-nomination  process 
to  arrive  at  final  reviewer  selection.  To  provide  continuity  to  the 
ER  from  year  to  year,  ER  members  will  receive  staggered  three  year 
appointments. 

5.  Immediately  after  the  scheduled  daily  presentations,  the  ER  will 
meet  in  executive  session  to  complete  its  program  assessment  task. 
The  ER  will  meet  with  the  Department  Head  and  members  of  the 
Department  during  lunch  to  discuss  the  Department's  programs.  At  the 
end  of  the  day,  the  ER  will  meet  with  the  agency's  senior  management 
to  discuss  the  review. 

6.  The  Department  Reviews  will  be  held  as  follows: 

a.  A  tentative  Department  Review  Schedule  is  provided. 
Department  Heads  will  provide  to  agency  senior  management  a  detailed 
agenda  one  month  prior  to  the  scheduled  presentation.  This  agenda 
will  be  used  in  the  invitations  for  distribution  by  the  Department 
Heads . 


b.  Department  Heads  will  'host'  their  Department's  reviews,  and 
will  begin  the  first  day  with  an  approximately  two  hour  overview  of 
their  program  including  15  to  20  minutes  for  questions  and 
discussions  (following  days  will  rely  on  the  Read-Ahead  packages  and 
an  abbreviated  summary  to  fill  in  newly  arrived  reviewers) .  The 
remaining  time  will  be  used  to  present  a  more  detailed  review  of  a 
portion  of  the  Department's  program.  Presentations  will  be  given  by 
the  appropriate  headquarters  program  officers,  not  by  contractors  or 
other  performers.  One  quarter  of  each  speaker's  time  slot  should  be 
reserved  for  questions  and  discussions.  Dry  runs  of  these 
presentations  are  strongly  encouraged. 

c.  Agency  senior  management  will  be  invited  by  the  Department 
Head,  who  also  should  invite  other  persons  who  will  contribute  to  or 
benefit  from  the  review.  For  instance,  these  persons  might  include 
appropriate  customers,  stakeholders,  users,  impactees,  other  Federal 
agency  managers  and  employees,  and  academic  and  industrial 
representatives  as  appropriate. 

7.  The  Department  Program  will  be  evaluated  using  the  criteria  and 
questions  in  enclosure  (2).  A  report  summarizing  the  reviewers' 
comments  on  the  Department  Program  and  recommended  action  items  is 
due  to  agency  senior  management  10  weeks  after  the  Review.  The 
senior  management  will  prepare  a  formal  list  of  follow-up  actions 
required. 

8.  The  philosophy,  principles,  structure,  and  goals  of  the 
Department  Overview  and  the  program  reviews  are  as  follows.  The 
Overview  presentation  should  show  how  the  Department  integrates  into 
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the  total  organization,  and  how  the  Department's  objectives  relate  to 
those  of  the  agency.  Then,  the  investment  strategy  of  the  Department 
should  be  presented  in  detail.  This  would  include,  the  relative 
program  priorities,  the  actual  investment  allocation  to  the  different 
programs,  and  the  rationale  for  the  investment  allocation.  (Later, 
for  each  program  presentation,  the  investment  strategy  for  its  thrust 
areas  should  be  presented.) 

The  investment  strategy  is  perhaps  the  most  crucial  part  of  a 
Department  and  program  review,  and  deserves  further  discussion  here. 
While  investment  is  the  allocation  of  resources  among  the  Department 
and  program  components,  the  investment  strategy  is  the  rationale  for 
the  prioritization  and  allocation  of  resources  among  the  Department 
and  program  components.  The  optimal  investment  strategy  for  a 
Department  and  its  programs  is  the  focal  point  of  the  assessment,  and 
is.  that  allocation  and  rationale  which  will  produce  the  most  mission 
relevant  high  quality  research  for  impacting  the  Department's  and  its 
program's  objectives. 

The  optimal  investment  strategy  results  from  a  timely  confluence 
of  research  requirements  (top-down  driven)  and  promising  research 
opportunities  (bottom-up  driven) .  Further,  promising  research 
opportunities  result  from  a  timely  confluence  of  advances  in  theory, 
instrumentation,  new  experiments,  new  algorithms,  and  computers. 
Finally,  research  requirements  result  from  a  timely  confluence  of 
domestic  and  foreign,  political  and  economic,  strategic  and  tactical 
advances.  All  of  the  above  factors  should  be  included  in  the 
presentation  of  the  investment  strategy. 

While  the  emphasis  is  on  peer  review,  bibliometric  and  other 
type  of  indicators  should  be  utilized.  It  is  recommended  strongly 
that  sufficient  background  material  be  supplied  to  the  reviewers 
before  the  review.  This  would  include  organizational  descriptive 
material,  narrative  descriptions  of  each  program  to  be  reviewed,  and 
descriptive  material  of  each  work  unit  in  the  program.  It  would  also 
prove  useful  to  include  bibliometric  output  indicators  for  each 
program,  with  interpretive  analytical  material.  This  could  include 
refereed  papers,  patents,  awards  and  honors,  presentations,  etc.  It 
would  be  useful  to  include  narrative  material  on  related  programs  in 
other  agencies  and  industry. 

In  particular,  the  following  material  should  be  presented.  (In 
the  interest  of  saving  time,  appropriate  items  may  be  included  in  the 
handout,  and  referred  to,  but  not  discussed.  In  such  hand  out 
material,  amplifying  statements,  not  bullets,  should  be  provided.) 

a.  The  Department's  Overview  should  be  just  that,  an  overview 
of  the  entire  Department's  program  including  outside  funding  and  a 
summary  of  related  programs  and  funding  at  other  agencies.  The 
vugraphs  of  the  Overview  (enclosure  3)  should  be  included  in  the 
handout  but  not  necessarily  discussed. 

b.  Indicate  what  has  transitioned  or  is  ready  for  transition. 

c.  New  ideas,  gaps  and  opportunities,  failures, 
accomplishments,  and  future  plans  are  the  heart  of  this  review. 

d.  Relevant  programs  managed  by  the  agency  for  outside 
organizations  should  be  included  in  the  discussions. 
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e.  Coordination  of  the  Department  program  both  within  and 
outside  the  agency  should  be  delineated.  This  should  include 
discussion  of  how  the  program  relates  to  other  Federal  agency 
programs,  industrial  programs,  and  international  programs. 

f.  Discuss  relevant  comments  from  National  Academy  panels  or 
other  appropriate  external  review/advisory  organizations  that 
influence  the  program. 

g.  Each  speaker  in  the  more  detailed  review  should  present 
his/her  investment  strategy  and  address  the  relation  of  his/her 
program  to  the  agency  investment  strategy. 

9.  A  distribution  of  written  hand-out  material  will  be  made  as 
follows,  with  reference  to  the  Timetable  of  enclosure  (4) : 

a.  A  distribution  of  essential  material,  preferably  by 
electronic  means,  to  the  reviewers  at  least  2  weeks  prior  to  the 
review  date.  This  material  will  include  narrative  descriptions  of 
each  program  to  be  reviewed,  narrative  descriptions  of  each  project 
within  a  given  program,  a  list  of  performers  and  a  siunmary  of 
transitions,  publications,  patents,  presentations,  awards,  and  other 
relevant  measures  of  program  quality  and  impact. 

b.  The  final  Department  Review  hand-out  package  should  include 
all  Vugraphs  and  canonicals  used  in  the  Department  Review,  assembled 
in  a  coherent  document  for  all  attendees.  Make  the  Vugraphs  stand 
alone  items  with  a  title,  properly  labeled  coordinates  and  a  short 
descriptive  text.  Each  vugraph  must  be  dated.  Updated  program 
narrative  descriptions  for  the  more  detailed  portion  of  the  program 
being  reviewed. 

c.  The  final  Department  Review  hand-out  package  of  essential 
materials,  will  be  distributed  2  days  before  the  review  to  the  agency 
senior  management.  At  least  20  copies  should  be  available  for 
distribution  at  the  review. 

10.  A  Department  Review  Report  is  due  10  weeks  after  the  Review  that 
includes : 


a.  List  ER  members  and  their  affiliations. 

b.  ER  comments  and  suggestions. 

c.  Major  accomplishments/ transit ions  with  a  sentence 
or  two. 

d.  Program  failures  (inability  to  meet  stated  program 
objectives — not  necessarily  a  sign  of  deficient  performance) , 
disappointments  or  phase  outs/downs.  These  are  inevitable  in  a 
program  with  an  appropriate  balance  of  high  risk/high  payoff 
projects. 

e.  Action  Items.  Written  comments  on  Action  Items  should 
be  completed  by  MAY  1995  for  assembling  in  the  final 

agency  report  of  the  FY94  Reviews. 
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EXTERNAL  REVIEWER  TASKING 


Enclosure  1 


Preparation 

1.  Agree  to  the  envisioned  3-year  term  as  a  ER  member 

2.  Execute  the  conflict  of  interest  form 

3.  Complete  the  appropriate  travel  forms 

4.  Digest  the  Read-Ahead  package,  including  the  Review  Criteria 
Program  Assessment 

Participate  in  the  daily  reviews,  gaining  the  understanding 
necessary  to  apply  the  Review  Criteria,  to  complete  the  review 
guestionnaire,  and  to  foinnulate  measures  of  technical  guality  of  the 
agency's  research  program.  Discuss  program  content  with  agency 
senior  management. 

Daily  outbrief 

Discuss  with  agency  senior  management  the  effectiveness  of 
briefings,  addressing  programmatic  concerns  and  other  issues. 

Summary  on-site  findings 

Participate  in  final  daily  outbrief,  summarizing  results  to  the 
extent  possible. 

Written  report 

Within  2  weeks  of  review  completion,  finalize  findings  and  mail 
to  agency. 


Enclosure  (2) 
CRITERIA  FOR  AGENCY  REVIEWS 


1.  Scientific  guality  and  uniqueness  of  ongoing  and 
proposed  efforts 

2.  Scientific  opportunities  in  areas  of 
likely  user  importance 

3 .  Balance  between  revolutionary  and  evolutionary  research 

4.  Position  of  research  relative  to  forefront 
of  other  scientific  efforts 

5.  Responsiveness  to  present  and  future  user  requirements 

6.  Possibilities  of  follow-on  programs  in  higher  R&D  categories 

7.  Appropriateness  of  research  for  agency  vice  other  Federal 
agencies. 
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QUESTIONS  FOR  AGENCY  PROGRAMS 

1.  What  is  the  investment  strategy  of  the  larger  management  unit. 
This  would  include  the  relative  program  priorities,  the  actual 
investment  allocation  to  the  different  programs,  and  the  rationale 
for  the  investment  allocation.  For  each  program  being  reviewed,  what 
is  the  investment  strategy  for  its  thrust  areas. 

2.  Can  specific  advantage  to  customer  be  identified  if  program 
is  successful? 

3.  Would  efforts  be  supported  if  they  were  not  already  underway? 

4.  What  is  the  technological  context  of  the  program  and  how 
does  it  fit  with  other  ongoing  research  in  academia,  industry, 
and  other  Federal  agencies? 

5.  Is  the  program  appropriately  coordinated  with  programs  at 
other  research  organizations? 

6.  What  are  the  research  objectives  of  the  program?  What  are 
the  "mid  term"  and  "final  assessment  criteria?"  How  much 
will  the  program  cost? 

7.  What  is  the  program  trying  to  do? 

8.  How  is  the  program  (effort)  done  today?  What  are  the 
limitations  of  the  current  practice? 

9.  What  is  new  in  the  approach?  Why  will  approach  be  successful? 

10.  What  are  the  major  risks  of  the  program? 

11.  Assuming  program  is  successful,  what  difference  will  the 
result  make  to  customer  capabilities? 


Enclosure  (3) 

DEPARTMENT  OVERVIEW 
DEPARTMENT  OBJECTIVE; 

SCOPE  OF  PROGRAM: 

(Statement  of  Science  giving  a  useful  taxonomy  for  the  programmatic 
content) 

Enclosure  (3A) 

DEPARTMENT  OVERVIEW 

TECHNICAL  ISSUES: 

(What  are  the  major  research  issues  currently  being  addressed  by 
the  Department  covering  the  spectrum  of  the  program?) 
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Enclosure  (3B) 

DEPARTMENT  OVERVIEW 

MAJOR  RESEARCH  ACCOMPLISHMENTS  (IN  PRIORITY) : 

(Do  not  make  so  concise  as  to  unintelligible  to  the  non¬ 
specialist.  Include  these  accomplishments  in  DEPTREV  Report.) 


Enclosure  (3C) 

DEPARTMENT  OVERVIEW 

Department  TRANSITIONS  (in  order  of  importance  -  list  transition 
recipient,  summarize  in  the  DEPTREV  Report.) 


Enclosure  (3D) 

DEPARTMENT  OVERVIEW 
PLANS: 

(Show  how  plans  lead  to  major  leap  ahead  capability.  Show  how 
program  solved  or  helped  solve  a  major  problem.  Where  does  it  fit  in 
the  big  picture?) 


Enclosure  (3E) 

DEPARTMENT  OVERVIEW 

AGENCIES  WITH  SIMILAR  OR  ALLIED  EFFORTS  (ANNUAL  $) 

(Program  similarities/differences.  Where  does  the  effort  fit  in 
the  national  research  effort?) 


Enclosure  (3F) 

Include  these  sheets  in  DEPTREV  Report. 

DEPARTMENT  OVERVIEW  FINANCIAL  SUMMARY  SHEET 

FY94  FY95  FY96  FY97 
(Program/Project  Title)  (Obligated  and  planned  $) 


OTHER 

Total  FY94  $K  spent 
Number  of  FY94  tasks,  # 
Average  FY94  task  cost*,  $K 
Average  FY93  task  cost*,  $K 
New  FY94  projects  initiated,  # 
FY94  funds  in  new  starts,  $K 
FY94  funds  in  new  starts,  %$ 
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9B.  ABRIDGED  AND  MODIFIED  VERSION  OF  GUIDANCE 

Sample  Peer  Review  Guidance 

A)  Overall  Objectives 

1.  Review  1/3  of  organization's  (Department,  Division,  Office, 
etc.)  programs  in  depth  each  year;  overview  remainder  of 
organization's  programs;  total  organization  program  reviewed 
triennially. 

2.  Review  vertically  integrated  programs  as  a  unit. 

3.  Primary  focus  on  technical  quality,  but  address  relevance, 
integration,  and  investment  strategy  as  well. 

4.  Board  of  Visitors  (BOV)  provides  comments  on  review.  Written 
comments  provided  independently  to  agency  staffer,  who  produces 
report.  The  BOV  consists  of  independent  experts  representing 
science,  technology,  customer,  and  other  agencies. 

5.  Invited  review  audience  includes  customers,  stakeholders, 
users,  impactees,  and  other  agency  representatives. 

6.  Summary  report  with  responses  to  reviewers'  comments  and 
action  items  due  to  agency  senior  management  after  review. 

B)  Sequence  of  Events 

1)  Selection  of  Reviewers 

A  science  and  technology  taxonomy  of  the  program  to  be  reviewed 
in  detail  is  generated,  and  brief  descriptors  of  each  taxonomy 
element  are  generated  for  reviewer  selection  purposes.  The  BOV  is 
selected  so  that  it  can  address  in  aggregate  detailed  science  and 
technology  quality,  research  and  technology  gaps  and  opportunities, 
broader  technology  and  organizational  issues,  and  mission  relevance 
issues.  Sources  of  reviewers  could  include  Defense  Sciences  Board, 
NAS,  NAE,  AFSAB,  NSB,  AAC  (NASA) ,  and  program  manager 
recommendations.  The  names  of  proposed  reviewers  are  presented  to 
the  agency  Director  for  approval  before  they  are  notified.  All 
reviewers  are  required  to  sign  non-conflict-of-interest  statements. 

2)  Distribution  of  Background  Material 

To  insure  that  review  time  is  used  most  efficiently,  reviewers 
and  invited  audience  receive  background  material  which  will  set  the 
stage  for  the  actual  review.  This  background  material  includes  the 
following  administrative  and  technical  canonical  material; 

a.  Structural  chart  of  agency,  showing  how  organization  fits 
into  agency  structure 

b.  Structural  chart  of  organization,  showing  programs  (including 
funding)  and  personnel  associated  with  each  program 

c.  Definitions  of  different  generic  types  of  programs  which  will 
be  presented  during  review 

d.  Other  administrative  material  (agenda,  reimbursement,  etc.) 

e.  Two  page  overview  of  each  program  being  reviewed  in  detail 
(e.g.  Weapons  Technology) ,  including  program  objective,  program 
thrusts  (e.g..  Aerodynamics,  Ordnance,  G&C,  etc.),  and  investment 
allocation  among  thrusts  (three  year  trends) 
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f.  Two  page  overview  of  each  program  thrust,  including  thrust 
objective  and  short  descriptions  of  each  technical  sub-thrust  (e.g., 
energetic  propellants,  combustion  instability,  propellant  safety) 
pursued  under  the  thrust  as  well  as  investment  allocations  among  sub¬ 
thrusts.  Total  program  and  thrust  descriptive  material  should  not 
exceed  twenty  pages. 

3)  Senior  Management  Introductory  Presentation 

To  initiate  the  actual  review,  a  senior  agency  manager  provides 
a  short  introduction  describing  structure  and  mission  of  the  agency, 
the  role  of  the  different  corporate  review  processes  in  executing  the 
mission,  and  a  more  detailed  description  of  the  purpose  and  goals  of 
Department  review.  This  person  describes  what  is  expected  from  BOV, 
and  how  BOV  comments  will  be  utilized. 

4)  Organization  Head  Presentation 

The  broader  technical  portion  of  the  presentations  is  initiated 
by  the  Organization  Head,  and  it  includes: 

a.  Mission  and  objectives  of  organization 

b.  List  of  all  programs  in  organization;  describe  objectives  of 
each  program,  show  funds  and  people  associated  with  each  program; 
note  program  to  be  reviewed  in  detail 

c.  Accomplishments  and  transitions  of  programs  not  being 
reviewed  in  detail;  relation  of  accomplishments  and  transitions  to 
organization's  mission  and  potential  national  impact 

d.  Responses  to  actions  from  previous  year's  review 

5)  Program  Manager  Presentation 

Each  program  manager  then  provides  a  more  detailed  overview  of 
the  program,  including: 

a.  Objectives  of  program 

b.  Requirements  to  be  met  (For  example,  in  the  review  of  a 
military-oriented  program:  what  is  the  present  and  evolving  threat- 
identify  documented  sources,  personal  contact  sources,  etc. ;  what  is 
the  importance  of  the  threat;  what  are  the  capabilities  required  to 
overcome  threat) 

c.  Investment  strategy 

cl.  List  of  thrusts  (e.g..  Propulsion,  Aerodynamics,  G&C)  and 
sub-thrusts  (e.g.,  energetic  propellants,  combustion  instability, 
propellant  safety)  selected  to  meet  requirements 
c2.  Objectives  of  each  thrust 

c3.  Thrust  and  sub-thrust  funding  and  prioritization 
c4 .  Rationale  for  thrust  and  sub-thrust  selection  and 
prioritization  (including  bases  for  rationale  and  prioitization  such 
as  system  studies,  workshops,  assessments,  intuition,  congressional 
and  other  mandates ,  etc . ) 

c5.  Integration  of  thrusts  and  sub-thrusts  to  form  program 
c6 .  Coordination/  Roadmaps 

c6i.  Roadmaps  describe  past,  present,  and  future  of  program  and 
linkage  to  other  internal  and  external  programs 

c6ii.  Roadmaps  contain  the  three  dimensions  of  time,  project 
title/  sponsor,  and  project  funding 
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d.  Team  quality  (identify  S&T  performers) 

e.  Summary  of  major  accomplishments,  transitions,  milestones  met 

6)  Technical  Manager  Presentation 

The  technical  managers  who  support  the  program  manager  will  present 
the  following: 

a.  Objectives  of  each  sub-thrust 

b.  Technical  roadblocks  to  achieving  the  sub-thrust  objectives 

c.  Technical  approach  for  overcoming  the  svib-thrust  roadblocks 

d.  Potential  sub-thrust  payoffs  and  capability  enhancements 

e.  Technical  results  achieved 

7)  Reviewers*  Written  Comments 

The  reviewers  fill  out  an  evaluation  form,  and  provide  it  to  the 
agency  review  manager  at  the  end  of  the  review.  A  sample  short 
evaluation  form  follows. 

PRESENTATION  EVALUATION  SHORT  FORM 

COMMENTS  (PLEASE  PROVIDE  YOUR  COMMENTS  IN  NARRATIVE  FORM.  WHERE 
APPLICABLE,  INCLUDE  YOUR  ASSESSMENT  OF  RELEVANCE,  GAPS  AND 
OPPORTUNITIES,  INVESTMENT  STRATEGY,  COORDINATION,  TECHNICAL  APPROACH, 
TEAM  QUALITY,  POTENTIAL  PAYOFF,  PRODUCTIVITY  AND  IMPACT.  THESE 
EVALUATION  CRITERIA  HAVE  BEEN  DEFINED  ON  THE  FIRST  PAGE  OF  YOUR 
EVALUATION  PACKAGE.) 

Reviewers  are  invited  to  submit  further  written  comments  after 
they  return  home. 


213 


Attachment  10  -  ESTIMATE  OF  PEER  REVIEW  COST 

Another  problem  with  peer  review  is  cost.  The  true  total  costs 
of  peer  review,  as  will  be  shown,  can  be  considerable  but  tend  to  be 
ignored  or  understated  in  most  reported  cases.  Because  there  are 
many  different  types  of  peer  review,  it  is  very  difficult  to  provide 
a  total  cost  rule-of-thumb  for  generic  peer  review.  Nevertheless, 
consider  the  following  illustrative  example  for  an  order  of  magnitude 
estimate  on  total  peer  review  costs. 

Assume  that  an  interim  peer  review  is  desired  of  a  $lM/yr 
program  at  a  laboratory.  The  review  mode  of  operation  will  be  to 
bring  a  panel  of  experts  to  the  laboratory  site  for  two  days,  and 
hear  presentations  from  the  principal  investigators.  Assume  that  the 
panel  consists  of  ten  experts  in  research,  technology,  mission 
operations,  etc.,  and  that  eight  principal  investigators  will  present 
their  projects  to  the  panel.  The  loaded  cost  (salary  plus  overhead) 
for  each  panel  member  is  assumed  to  be  $150,000  per  year,  and  the 
loaded  cost  for  each  principal  investigator  is  assumed  to  be  $125,000 
per  year.  Direct  expenditures,  such  as  panel  per  diem  and  travel 
costs,  would  be  in  the  neighborhood  of  $6,000-8,000.  Any  honoraria 
would  increase  this  cost. 

Indirect  expenditures,  such  as  total  reviewer,  presenter,  staff, 
and  review  audience  time  spent  toward  the  review,  would  be  in  the 
range  of  $125,000  and  would  include  at  least  the  following: 

1.  Presenter  time  in  preparing  background  material  for  reviewers 
to  read  before  review,  preparing  the  presentation,  making  dry  runs 
for  management,  etc.  [$40,000  estimate;  80  person-days]; 

2.  Panel  member  time  for  reading  background  material  (papers, 
reports,  plans) ,  traveling  to  review,  spending  time  at  meeting, 
writing  report,  etc.  [$48,000-60,000  estimate;  80-100  person-days]; 

3.  Agency  staff  time  for  identifying  and  soliciting  reviewers, 
establishing  review  and  coordinating  with  lab,  writing  reports,  etc. 
[$10,000  estimate;  20  person-days]; 

4.  Audience  (lab  management,  other  lab  personnel,  other  agency 
representatives,  etc.)  time  at  review  [$20,000  estimate;  40  person- 
days]  . 

The  main  conclusion  of  this  discussion  is  that  for  serious 
panel-type  peer  reviews,  where  sufficient  expertise  is  represented  on 
the  panels,  total  real  costs  will  dominate  direct  costs.  This 
conclusion  would  also  be  true  for  mail-type  peer  reviews.  While  the 
total  costs  of  mail-type  peer  reviews  would  be  less  than  those  of 
panel-type  peer  reviews  due  to  the  absence  of  travel  costs,  the  ratio 
of  total  costs  to  direct  costs  for  mail-type  peer  reviews  would  be 
very  high.  The  major  contributor  to  total  costs  for  either  type  of 
review  is  the  time  of  all  the  players  involved  in  executing  the 
review.  With  high  quality  performers  and  reviewers,  time  costs  are 
high,  and  the  total  review  costs  can  be  a  non-negligible  fraction  of 
total  program  costs,  especially  for  programs  that  are  people 
intensive  rather  than  hardware  intensive. 
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Attachment  11  -  DOE  PROCEDURES  FOR  PEER  REVIEW  ASSESSMENTS 

EXECUTIVE  SUMMARY 
PEER  REVIEW  ASSESSMENT  PROCESS 

The  Office  of  Program  Analysis  (OPA)  conducts  peer  review 
assessments  of  Department  of  Energy  (DOE)  research  and  development 
projects.  The  purpose  of  these  reviews  is  to  provide  independent 
assessment  of  the  quality  and  impact  of  each  of  the  individual 
projects  which  comprise  a  program. 

These  reviews  are  carried  out  by  five  to  nine  scientific  and 
technical  experts  who  evaluate  individual  projects  and  provide 
their  individual  numeric  ratings  and  commentary  to  OPA.  OPA 
chooses  the  reviewers  after  obtaining  recommendations  from  the 
research  program  managers  and  the  principal  investigators  of  the 
projects  to  be  reviewed.  OPA  also  conducts  its  own  independent 
search  for  the  best  qualified  candidates  from  academia,  industry, 
Government  laboratories,  and  other  sources. 

Presentations  to  the  reviewers  are  made  by  program  managers 
and  principal  investigators.  After  a  question  and  answer  period, 
which  includes  clarification  of  technical  questions  and  a  review  of 
the  evaluation  criteria,  reviewers  prepare  individual  numeric 
ratings  and  commentary.  An  OPA  staff  member  is  present  throughout 
the  review  process  to  ensure  a  uniform  application  of  these 
procedures,  that  there  is  no  attempt  to  reach  consensus  by  the 
reviewers,  and  to  receive  the  reviewers  individual  inputs. 


PROCEDURES  FOR  PEER  REVIEW  ASSESSMENTS 


Introduction 

These  assessment  procedures  provide  the  basis  for  implementing 
the  methodology  developed  by  the  Office  of  Program  Analysis  (OPA) 
for  assessing  the  quality  and  relevance  of  research  and  development 
within  the  Department  of  Energy  (DOE) .  The  reviews  are  performed 
by  examining  individual  projects  which  comprise  a  program  and  by 
assessing  the  quality  of  the  research,  quality  of  the  research 
team,  productivity,  probability  of  success,  and  mission  relevance 
for  each  project  reviewed. 

OPA's  methodology  relies  upon  scientific  and  technical  experts 
to  evaluate  individual  projects.  Analysis  of  the  ratings  given 
individual  projects  can  contribute  to  the  evaluation  of  a  program. 

Methodology 

Reviewers  are  recruited  in  the  specific  technical  area  of  a 
set  of  related  research  projects.  Project  reviews  take  place  in 
sessions  lasting  from  two  to  four  days.  Prior  to  the  review 
meeting,  a  package  of  documentation  covering  the  subject  areas  is 
requested  from  each  Principal  Investigator  to  help  reviewers 
prepare  for  the  review  session.  An  outline  of  this  documentation 
is  provided  in  Appendix  A,  "Information  to  be  Provided  and 
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Presented  by  the  Project  Principal  Investigator". 

Because  many  of  the  projects  are  broad  and  multi-disciplinary, 
input  from  a  broad  range  of  experts  is  often  necessary  to  perform 
the  review.  Every  effort  is  made  so  that  at  least  two  of  the 
reviewers  are  expert  in  the  principal  scientific  disciplines  or 
technical  areas  of  each  research  project.  Reviewers  are  drawn  from 
academic  institutions,  industry,  Government  laboratories,  and  other 
sources,  as  appropriate. 

OPA  chooses  reviewers  after  obtaining  recommendations  from  the 
DOE  program  office,  from  the  project  principal  investigators  to  be 
reviewed,  and  by  an  independent  search  for  qualified  candidates. 

From  its  staff,  OPA  designates  a  Technical  Project  Officer  to 
lead  the  assessment  and  staff  members  for  each  group  of  projects. 
The  OPA  staff  member  facilitates  performance  of  the  assessment  and 
assists  in  completing  the  required  tasks.  The  OPA  staff  member 
also  ensures  that  the  reviewers  strictly  adhere  to  these  assessment 
procedures,  including  limiting  discussions  to  exchanges  of 
information. 

The  peer  review  focuses  on  the  scientific  and  technological 
aspects  and  the  mission  relevance  of  the  projects  reviewed  and  not 
on  budgetary  or  management  issues.  Reviewers  evaluate  the  scien¬ 
tific  and  technical  merit  and  quality  of  the  research  being 
performed  under  the  current  contract  or  grant,  or  in  the  case  of  a 
recent  extension  or  continuation  of  the  project  to  be  reviewed,  the 
immediately  preceding  contract  or  grant.  The  reviewers  evaluate 
the  most  recent  project  performance,  results,  and  products. 

To  begin,  the  reviewers  meet  in  a  plenary  session.  After 
welcoming  remarks  and  an  overview  of  the  program  being  reviewed,  a 
summary  of  the  review  process  is  presented,  key  staff  personnel  are 
introduced,  and  questions  are  answered.  Reviewers  then  proceed  to 
listen  to  the  presentation  of  projects  by  the  principal 
investigators . 

Assessment  of  Projects 

Prior  to  the  appearance  of  the  first  principal  investigator,  a 
DOE  program  manager  briefs  the  reviewers  for  up  to  15  minutes  on 
the  set  of  projects  to  be  reviewed.  This  overview,  which  can  take 
the  form  of  several  briefings  rather  than  one,  orients  the  panel  on 
the  history,  specific  objectives,  context  within  the  DOE  program 
area,  and  context  within  the  field  of  the  program  area  of  each  set 
of  projects. 

The  principal  investigator's  30  minute  briefing  emphasizes  the 
scientific  and  technical  aspects  of  the  project,  namely: 

1.  specific  project  objectives  and  how  they  relate  to  the  DOE 
program's  mission; 

2.  resources  of  time,  special  talents,  and  facilities  used; 

3.  scientific  and  technical  content  of  the  project  (issues 
being  addressed  and  their  significance  and  importance  to  the  DOE 
program) ; 

4.  experimental  and  theoretical  approaches;  and 

5.  major  recent  accomplishments  of  the  project  together  with 
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supporting  data. 

Principal  investigators  also  inform  reviewers  of. the  identity 
of  project  staff  and  any  collaborators,  including  their  education, 
expertise,  and  role  in  the  project,  and  the  project's  history 
showing  DOE  and  non-DOE  funding  broken  down  by  the  periods  of  time 
in  which  the  funding  was  received. 

The  principal  investigator's  oral  briefing  supplies  the 
reviewers  with  sufficient  information  to  evaluate  the  project  using 
the  factors,  criteria,  and  formats  given  in  Form  1.  A  list  of 
topics  to  be  included  in  the  oral  presentation  is  provided  in 
Appendix  A,  "Information  to  be  Provided  and  Presented  by  the 
Project  Principal  Investigator".  While  the  documentation  may 
complement  the  oral  presentation,  it  does  not  supplant  the 
requirement  for  a  self-sufficient  briefing  to  the  reviewers  by  the 
Principal  Investigator. 

Reviewers  independently  assess  the  quality  of  the  research, 
quality  of  the  research  team,  productivity,  mission  relevance,  and 
chooses  an  overall  project  rating  using  the  factors,  criteria,  and 
formats  given  in  Form  1.  They  also  complete  a  Self-Rating  Form 
(Form  2)  for  the  project  being  rated.  The  self-rating  form  assists 
OPA  staff  in  preparing  their  summary. 

The  assessment  forms  are  followed  precisely  in  performing  the 
evaluation  to  ensure  consistency  of  review  for  all  projects.  Any 
issues  that  arise  regarding  the  meanings  of  the  factors  or  criteria 
on  the  forms,  or  other  matters  requiring  interpretation  of  these 
procedures,  are  resolved  by  the  OPA  staff. 

Identification  of  Research  Needs  and  Opportunities 

Based  on  their  knowledge  of  the  needs  of  the  program  area, 
together  with  the  newly  acquired  information  pertaining  to  the 
nature  and  quality  of  DOE-funded  projects  in  their  areas  of 
expertise,  reviewers  are  asked  to  individually  identify  research 
needs  and  opportunities  with  respect  to  the  DOE  research  program  in 
the  specific  area  of  the  panel  review. 

Assessment  Summary 

OPA  staff,  drawing  upon  individual  contributions  from 
reviewers,  prepares  an  assessment  summary.  These  contributions 
include  all  the  numerical  scores  for  each  project  accompanied  by 
commentaries.  Future  needs  for  supporting  research  is  also 
provided  as  commentary. 

Instructions  for  Completing  Project  Rating  Forms 

Peer  Review  Questionnaire  (Form  1) 

Reviewers  individually  rate  the  project  in  each  of  six  areas 
and  choose  an  overall  rating;  scientific  (technical)  merit, 
importance  of  project,  quality  of  project  team,  scientific 
(technical)  approach,  productivity,  and  probability  of  success. 
Ratings  in  these  categories  use  a  scale  composed  of  integer  values 
from  zero  to  ten,  with  the  ends  of  the  scale  representing  seriously 
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deficient  and  outstanding  attributes,  respectively. 

For  Item  Ql,  "Scientific  (Technical)  Merit,"  reviewers  assess 
the  importance  of  the  scientific  (technical)  question,  or  problem 
addressed,  including  the  potential  importance  or  value  to  science 
(technology)  of  meeting  the  project  objectives.  This  judgment  is 
based  primarily  on  the  reviewer's  knowledge  of  the  scientific 
(technical)  field. 

In  Item  Q2,  "Importance  of  Project,"  the  reviewer  is  to  assess 
the  importance  of  the  project's  objectives  in  terms  of  contributing 
to  the  program's  mission. 

For  Item  Q3 ,  "Quality  of  Project  Team,"  reviewers  consider  the 
composition  and  quality  of  the  team  through  examination  of 
contributions  by  individual  and  associated  team  members  relevant  to 
the  objectives  of  this  project,  honors  and  awards,  experience 
relevant  to  the  project  area,  and  the  balance  of  appropriate  skills 
(including  collaborators) ,  for  accomplishing  the  project 
objectives. 

For  Item  Q4,  "Scientific  (Technical)  Approach,"  reviewers 
consider  the  appropriateness  of  the  experimental  and  analytical 
methods  used  and  the  level  of  insight  and  innovation  demonstrated 
in  relation  to  the  requirements  of  the  project's  objectives. 

For  Item  Q5,  "Productivity,"  the  reviewers  consider  the 
impact,  volume,  quality,  and  usefulness  of  work  produced  by  the 
project  team  as  a  whole  and  relate  this  output  to  the  resources 
available  and  costs  incurred. 

For  Item  Q6,  "Probability  of  Success,"  reviewers  assess  the 
likelihood  that  the  project  will  accomplish  its  stated  objectives. 

Overall  Project  Evaluation 

The  overall  project  evaluation  score  is  a  weighted  judgment  by 
the  individual  reviewer  based  on  his/her  experience  and  on  the 
ratings  given  for  Items  Ql  to  Q6.  It  is  not  mathematically  derived 
from  the  factor  scores.  Criteria  for  choosing  an  overall  project 
evaluation  are  also  on  Form  1. 

Responsibilities  of  Participants 

Instructions  for  Reviewers 

Reviewers  are  the  key  participants  in  the  peer  review 
assessment.  They  have  the  following  responsibilities: 

Becomfe  familiar  with  the  peer  review  assessment  procedures 
before  review  of  the  first  project.  This  includes:  (a)  reviewing 
the  "Procedures  for  Peer  Review  Assessments"  document,  which  is 
provided  prior  to  the  review  session;  (b)  attending  the  plenary 
session  at  the  start  of  the  review  for  an  overview  of  the  process 
and  a  presentation  of  mission  of  the  program  whose  projects  are  to 
be  reviewed;  and  (c)  following  the  detailed  instructions  given  by 
the  OPA  staff. 

Prior  to  each  project's  review,  read  the  pre-presentation 
package  of  information  provided  by  the  principal  investigator. 

Participate  in  the  review  of  every  project  of  the  group 
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assigned.  Listen  to  the  principal  investigator's  oral  presentation 
and,  in  turn,  ask  pertinent  questions  to  obtain  the  information 
needed  to  answer  the  rating  factors  of  the  questionnaire  forms. 

Reviewers  clarify  technical  points  and  rating  criteria  as 
necessary. 

Reviewers  independently  complete  a  set  of  rating  forms  on  the 
project  presented.  The  reviewers  use  the  standard  factors,  and 
criteria  provided  on  the  forms,  and  provide  brief  supporting 
written  comments. 

Following  project  evaluations,  the  reviewers  independently 
identify  research  needs  or  opportunities  in  the  technical  area  of 
the  group's  projects. 

Instructions  for  Principal  Investigators 

Principal  investigators  provide  five  types  of  information  to 
facilitate  review  of  their  research: 

1.  statement  of  current  project  objectives,  and  an  abstract  of 
the  project; 

2.  recommendations  of  expert  reviewers  for  their  project; 

3.  notification  that  the  project  does  or  does  not  contain 
privileged  or  protected  information; 

4.  a  package  of  documentation  describing  the  project;  and 

5.  an  oral  presentation  to  the  reviewers  at  the  group  session. 

The  project  abstract,  recommendation  of  expert  reviewers  and 
notification  of  whether  or  not  the  project  contains  privileged  or 
protected  information  is  forwarded  to  the  Technical  Project  Officer 
by  the  Principal  Investigator  immediately  upon  receipt  of 
notification  of  the  project  review. 

In  the  pre-review  session  package,  principal  investigators 
provide  information  describing  the  project  relevant  to  the  rating 
factors  and  criteria  on  Form  1.  No  privileged  or  unprotected 
information  is  included.  This  package  helps  the  reviewers  prepare 
to  hear  the  upcoming  oral  presentation.  The  Technical  Project 
Officer,  or  his/her  designee,  collects  the  documentation  from  the 
principal  investigator  and  forwards  copies  to  the  appropriate 
reviewers,  OPA  staff,  and  DOE  Program  Managers.  Contents  of  the 
package  are  described  in  Appendix  A. 

Appendix  A  also  lists  the  items  to  be  included  in  the 
Principal  Investigator's  oral  briefing  to  the  reviewers.  Items  5A 
(specific  project  objectives),  5B  (how  the  project  relates  to  the 
DOE  program's  mission),  6  (scientific  and  technical  content),  and  7 
(recent  project  output)  are  stressed  in  the  oral  presentation. 
Conversely,  Items  1,  3,  4,  5C,  and  5D  in  Appendix  A  (principal 
project  personnel,  additional  project  personnel,  project  history, 
and  how  the  project  relates  to  others  being  funded  by  the  DOE 
program)  are  quickly  covered  in  one  or  two  slides  or  by  handout 
material . 

Investigators  supply  12  copies  of  any  reprints,  documents,  or 
material  provided  to  the  reviewers.  This  applies  to  both  the 
package  of  documentation  and  visual  aids  used  during  the  Principal 
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Investigator's  oral  presentation.  The  copies  of  the  visual  aids 
are  provided  to  the  reviewers  at  the  start  of  the  oral 
presentation. 

Reviewers  expect  the  oral  presentations  to  primarily  deal  with 
the  technical  content  and  output  of  the  project,  and  to  provide 
supporting  data.  Reviewers  base  their  individual  evaluations  of  a 
project  primarily  on  the  oral  presentation  made  by  the  Principal 
Investigator  and  on  his  or  her  responses  to  questions.  While  the 
package  of  documentation  may  complement  the  oral  presentation,  it 
does  not  supplant  the  requirement  for  a  self-sufficient  briefing  to 
the  reviewers  by  the  Principal  Investigator. 

Instructions  for  DOE  Program  Managers 

The  Program  Managers  from  DOE  Headquarters,  or  DOE  Field 
Offices,  have  an  important  role  in  helping  reviewers  to  understand 
the  importance  and  significance  of  the  research  being  reviewed. 

The  briefing  during  the  plenary  session  (up  to  20  minutes) 
includes : 

1.  statement  of  interest  in  review  and  intended  use  of 
results ; 

2.  overview  of  program,  including  background,  contents,  scope, 
timing,  and  relationship  of  subparts;  and 

3.  official  program  mission  statement  from  approved  planning 
document.  (Plan  to  leave  copies  of  the  statement  for  use  by 
reviewers , ) 

To  prepare  reviewers  to  assess  the  projects  brought  before 
them  in  the  proper  context,  the  Program  Manager  presents  (for  up  to 
15  minutes)  the  following: 

1.  definition  of  subprogram  area(s)  represented  by  projects  to 
be  reviewed; 

2.  goals  of  these  subprogram  areas  related  to  the  official 
program  mission  statement; 

3.  how  each  project  to  be  reviewed  supports  subprogram  goals; 

4.  relation  to  other  projects  (not  assessed  by  these 
reviewers)  that  also  support  these  goals;  and 

5.  pertinent  history,  accomplishments,  and  plans  for  the 
subprogram. 

The  introductory  briefing  for  each  project  (up  to  5  minutes) 
covers : 

1.  relation  of  project  objectives  to  subprogram  goals  and 
official  program  mission,  and  benefits  expected  if  project  is 
successful ; 

2.  summary  of  funding  and  performance  dates;  and 

3 .  introduction  of  the  Principal  Investigator  and  any 
observers  to  the  presentation. 
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Program  Managers  also  ensure  that  project  objectives  provided 
by  the  Principal  Investigator  represent  the  Program's  full 
expectations  for  that  project.  The  objectives  presented  by  the 
Principal  Investigator  are  the  basis  used  by  the  reviewers  to 
perform  their  evaluation  of  the  research. 

NONDISCLOSURE  OF  PRIVILEGED  OR  PROTECTED  INFORMATION 

Principal  Investigators  are  encouraged  to  exclude  privileged 
or  protected  information  from  their  presentation,  whenever 
possible . 

Under  current  law  (the  Dole-Bayh  Act,  35  USC  Sect.  200,  et 
sea. ) .  universities,  nonprofit  organizations,  and  small  businesses 
have  the  right  to  retain  the  title  to  any  inventions  made  under  a 
Government- funded  contract  or  grant.  At  the  same  time,  it  is  a 
Department  of  Energy  (DOE)  goal  to  have  projects  assessed  by  the 
most  competent  persons  available. 

Where  it  is  determined  to  evaluate  a  project  falling  under  the 
purview  of  the  Dole-Bayh  Act  using  persons  from  outside  the 
Government,  such  as  consultants,  grantees,  and  contractors,  each 
reviewer  must  sign  a  nondisclosure  agreement  before  receiving 
access  to  information  in  which  a  university,  nonprofit 
organization,  or  small  business  claims  to  have  a  proprietary 
interest  or  claims  is  confidential.  The  nondisclosure  agreement  is 
shown  on  Appendix  B.  It  has  been  approved  by  Department  of 
Energy's  Office  of  General  Counsel. 

The  Department  of  Energy  cannot  guarantee  the  maintenance  of 
absolute  secrecy  in  the  handling  of  privileged  information 
presented  by  the  Principal  Investigators.  Consequently,  the 
Principal  Investigators  are  discouraged  from  providing  such 
information  in  their  oral  presentations.  However,  when  the 
presentation  of  privileged  information  is  essential  to  provide  a 
complete  understanding  of  the  project's  research  by  the  reviewers, 
DOE  will  make  every  attempt  to  protect  the  confidentiality  of 
presented  privileged  information  in  accordance  with  existing  laws 
and  regulations. 

The  Principal  Investigator  is  to  alert  the  Technical  Project 
Officer  (Appendix  A,  Item  II)  and  the  reviewers  if  information, 
data,  or  material  requiring  protection  is  to  be  included  in  the 
oral  presentation.  Such  information  is  not  to  be  included  in  the 
package  of  documentation  on  the  project.  The  Principal 
Investigator  is  not  to  provide  any  privileged  information  in  copies 
of  oral  presentation  visual  aids  to  be  given  to  the  reviewers. 

The  reviewers  and  OPA  staff  are  alerted  to  the  presentation  of 
privileged  or  protected  information  by:  (1)  a  statement  by  the 
Principal  Investigator  at  the  beginning  of  his  or  her  oral 
presentation,  and  (2)  at  the  time  the  specific  privileged 
information  is  provided  in  the  oral  presentation. 

PROJECT  RATING  FORMS 

FORM  1  Reviewer 
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# _ 

Panel/Project:  _  Date  of  Review:  _ 

PEER  REVIEW  QUESTIONNAIRE 

Ql.  Scientific  or  Technical  Merit  of  the  Project  Objectives 

0123456789  10 

Project  objectives  of  central  importance  to  advancing  the  science, 
technology,  discipline,  or  research  area  rate  9-10,  project 
objectives  that  address  significant  issues  rate  7-8,  project 
objectives  providing  information  of  general  usefulness  and  interest 
rate  5-6,  Routine  project  objectives  rate  3-4,  and  project 
objectives  of  doubtful  or  peripheral  interest  would  rate  0-2. 

Circle  the  appropriate  number  for  your  rating. 

Supporting  Comments: 


Q2.  Importance  of  Project  Objectives  to  Mission 

State  your  estimate  of  the  importance  of  this  project's  stated 
objectives  in  terms  of  contributing  to  the  program's  stated 
mission.  Circle  the  appropriate  number  for  your  rating. 


Not  Important 

Very  Important 

0  12  3 

4 

5 

6 

7 

8  9 

Supporting  Comments: 

Q3 .  Quality  of  Project  Team 

0123456789  10 

An  outstanding  team  rates  9-10,  a  strong,  balanced  team  of 
experienced  investigators  rates  7-8,  a  good  team  that  would  benefit 
from  additional  skills  rates  5-6,  a  team  that  requires 
strengthening  rates  3-4,  and  a  team  with  serious  shortcomings  rates 
0-2 . 


Supporting  Comments: 


Q4 .  Scientific  or  Technical  Approach 
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0123456789  10 

An  expert  and  innovative  approach  rates  9-10,  a  skillful  and 
logical  approach  rates  7-8,  a  reasonable  approach  with  potential 
for  improvement  rates  5-6,  an  approach  with  key  shortcomings  or  an 
approach  that  is  out-of-date  rates  3-4,  and  an  inappropriate  or 
illogical  approach  rates  0-2.  Circle  the  appropriate  number  for 
your  rating - 

Supporting  Comments: 


Q5.  Productivity 

01  2  3  4  5  6  7  8  9  10 

With  respect  to  the  resources  available:  9-10  indicates  high 
impact,  exceptional  output,  7-8  indicates  significant  results  at  an 
extensive  rate,  5-6  indicates  interesting  results  at  a  reasonable 
rate,  3-4  indicates  marginal  output,  and  0-2  denotes  little 
evidence  of  progress.  Circle  the  appropriate  number  for  your 
rating.  If  the  project  has  not  been  under  way  long  enough  to  be 
rated  for  productivity,  so  state. 

Supporting  Comments: 


Q6.  Probability  of  Success 

State  your  estimate  of  the  probability  of  success  of  this  project 
accomplishing  its  stated  objectives.  Circle  the  appropriate  number 
for  your  rating. 

Low  High 

01  234  5  6789  10 

Supporting  Comments: 


OVERALL  PROJECT  EVALUATION 

01234  56789  10 

An  outstanding  project  rates  9-10.  A  strong  project  deserving  of 
priority  continuation  rates  7-8,  while  a  good  project,  deserving  of 
continuation,  that  may  have  some  shortcomings  which  can  be 
addressed  by  the  Principal  Investigator  rates  5-6.  A  weak  project, 
or  one  with  some  deficiencies  requiring  program  management 
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attention  rates  3-4,  and  a  poor  project  with  serious  deficiencies 
which  warrants  close  reevaluation  by  program  management  rates  0-2. 
Circle  the  appropriate  number  for  your  rating. 

Supporting  Comments: 


FORM  2 

Reviewer  #  _ 

Panel/Project:  _  Date  of  Review: _ 

REVIEWER  SELF-RATING 

1.  Please  rate  your  knowledge  in  the  scientific/technical  research 
area  or  discipline  covered  in  this  project. 

Novice  Understand  Knowledgeable  Expert 

01  2  345678  9  10 


Appendix  A 

INFORMATION  TO  BE  PROVIDED  AND  PRESENTED  BY  THE  PROJECT  PRINCIPAL 
INVESTIGATOR 

1)  Abstract  (one  page  maximum) 

Project  Title:  The  title  should  describe  the  current  project 
effort,  especially  for  projects  that  may  have  been  renewed. 

Objective (s) :  in  priority  order,  and  why  are  you  doing  this? 

Body  of  abstract  which  describes  the  research  project,  and 
particularly  the  technical  approach. 

Project  Performance  Period: 

Project  Funding: 

Principal  Investigator (s) :  Name(s),  Mailing  Address,  Phone  Number, 
FAX  number.  E-mail  address. 

Principal  Investigator (s)  Organization: 

DOE  Program  Manager: 

Please  return  this  form  to  Technical  Project  Officer,  ER-XXX.  E- 
mail  is  preferred  in  ASCII  format.  Fax  an  additional  copy  to  FAX 
301-903-5561  (or  3888)  as  backup. 


224 


2.  Patentable  or  Potentially  Patentable  Inventions 

Principal  Investigators  are  encouraged  to  exclude  patentable, 
unpublished,  and  unfiled  information,  whenever  possible. 

Under  current  law  (the  Dole-Bayh  Act,  35  USC  Sect.  200,  et 
sea. ) ,  universities,  nonprofit  organizations,  and  small  businesses 
have  the  right  to  retain  the  title  to  any  inventions  made  under  a 
Government-funded  contract  or  grant.  At  the  same  time,  it  is  the 
Department  of  Energy's  goal  to  have  projects  assessed  by  the  most 
competent  persons  available.  Where  it  is  determined  to  evaluate  a 
project  falling  under  the  purview  of  this  law,  using  persons  from 
outside  the  Government,  such  as  consultants,  grantees,  and 
contractors,  a  nondisclosure  agreement  or  an  equivalent  arrangement 
is  needed  before  protected  or  privileged  information  can  be 
released  to  such  evaluators. 

If  the  information  to  be  presented  requires  protection  under 
the  Dole-Bayh  Act,  35  USC  Sect.  200,  et  sea. .  include  and  sign  the 
following  statement: 

"I  am  an  owner,  officer,  or  employee  of  a  university, 
not-for-profit  organization,  or  small  business,  and  the  information 
to  be  presented  will  contain  information,  data,  or  material 
pertaining  to  an  invention,  or  inventions,  made  under  a 
Government-funded  contract  grant  that  I  believe  is  potentially 
patentable." 


Signature 

Otherwise,  indicate  not  applicable  on  the  signature  line. 


Ten  Page  Research  Summary 
(Limit  to  10  pages) 


1.  Project  Title 

Principal  Investigator: 

Organization: 

Address : 

Telephone  Number (s) : 

2.  Principal  Project  Personnel 

Identify  the  important  technical  contributors  to  the  project, 
including  the  Principal  Investigator,  and  provide  the  following 
information  for  each: 

A.  Role  in  the  project. 

B.  Principal  areas  of  research  and  expertise. 

C.  An  indication  of  the  percentage  of  time,  or  annual  hours, 
each  devotes  to  the  project. 

D.  Education. 
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E.  Relevant  professional  employment  history,  including  a  list 
of  the  institutions,  dates  employed,  and  positions  held. 

F.  Relevant  professional  activities  and  honors..  This  could 
include  professional  society  activities,  awards  and  prizes, 
patents,  advisory  committee  assignments.  Congressional  testimony, 
or  any  other  activities  which  reflect  on  standing  within  the 
research  community. 

G.  Relevant  publications  not  emanating  from  this  project. 

CDo  not  include  extensive  lists  of  publications  of  little  relevance 
to  the  project  being  evaluated.) 

3.  Additional  Project  Personnel 

A.  For  other  members  of  the  technical  staff:  name, 
education,  principal  areas  of  research  and  expertise,  and  role  in 
the  project. 

B.  For  collaborators  on  the  project:  name,  institution, 
position,  education,  principal  areas  of  research  and  expertise,  and 
role  in  the  project. 

4.  Project  Overview 

A.  Specific  Project  Objectives 

1)  Past  (if  project  is  a  continuation  or  extension  of 
earlier  work  or  a  follow-on  building  on  earlier  successes). 

2)  Current  (the  project  is  to  be  measured  only  against 
the  current  objectives.) 

3)  Planned  future  work  on  this  project. 

B.  How  this  project  relates  to  the  DOE  program's  mission. 

C.  How  this  project  relates  to  other  projects  being  funded  by 
DOE  (to  the  extent  this  is  known  by  the  Principal  Investigator) . 

D.  Project  History 

1)  Previous  and  current  funding  (broken  out  to  identify 
direct  research  funding  and  overhead,  list  DOE  and  non-DOE 
separately) . 

2)  Previous  and  current  contracts  or  grants  and  their 
beginning  and  ending  dates. 

5.  Scientific  and  Technical  Content 

A.  Relation  of  this  research  to  research  being  conducted  by 
others  in  this  field. 

B.  Importance  of  solving  the  problem  being  addressed  by  this 
research. 

C.  Schedule  of  major  research  activities. 

D.  Scientific  or  technical  issues  currently  being  addressed 
and  their  significance. 

E.  Experimental  and  theoretical  approach  taken,  techniques 
used,  and  resources  applied. 

6.  Project  Output 
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Information  relative  to  the  project  output  which  includes  at  least 
the  following: 

A.  Major  recent  accomplishments  with  supporting  data  and 
their  significance  (only  those  products  or  results  under  the 
current  contract  or  grant) . 

B.  Bibliography  of  publications  emanating  from  this  project. 
From  the  bibliography,  select  no  more  than  five  of  the  most  recent, 
significant  publications  in  the  professional  or  scientific 
literature  and  submit  12  copies,  or  reprints,  of  each. 

7.  Oral  Presentation 

Key  points  for  a  Principal  Investigator  when  preparing  an  oral 
briefing: 


A.  Privileged  or  protected  information  should  be  presented 
only  at  the  oral  presentation;  then  only  if  absolutely  necessary. 

No  privileged  or  protected  information  is  to  be  included  in  the 
pre-review  session  package  of  information  describing  the  project. 

B.  Special  presentation  needs  should  be  specified  in  advance; 
otherwise,  presentation  rooms  will  be  equipped  with  one  overhead 
projector,  one  35  millimeter  slide  projector,  and  an  easel  for 
writing  or  drawing. 

C.  Investigators  supply  12  copies  of  any  reprints,  documents, 
or  material  to  be  provided  to  the  reviewers.  Exclude  privileged 
information  from  the  handouts.  Presenters  should  hand  out  12  hard 
copies  of  their  presentation  aids  to  the  reviewers  at  the  time  of 
their  presentation. 

D.  List  all  of  the  project's  objectives,  indicate  the 
percentage  resources  allocated  to  each  objective,  and  show  the 
current  status  toward  fulfillment  of  each  objective.  State  the 
scientific  importance  and  relevance  of  these  objectives  to  the  DOE 
program's  mission. 

E.  Emphasize  technical  approaches,  recent  work,  and  recent 
progress  and  accomplishments.  In  doing  so,  present  data,  charts, 
equations,  photographs,  or  other  commonly  recognized  proof  of 
results . 

F.  Discuss  work  performed  on  earlier  contracts  or  on  parallel 
non-DOE  contracts  only  as  needed  to  set  the  stage  for  the  current 
project.  Generally,  spend  no  more  than  five  minutes  on  earlier 
work.  Clearly  identify  work  not  performed  on  the  current  project. 


Appendix  B 

U.  S.  DEPARTMENT  OF  ENERGY 
(PROGRAM  AREA) 

PEER  REVIEW  NONDISCLOSURE  AGREEMENT 
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I  agree  to  use  the  information  revealed  during  review  of  the 
project. 


only  for  Department  of  Energy  (DOE)  assessment  purposes  and  to 
treat  the  information  which  may  be  confidential  in  nature  in 
confidence.  The  specific  type  of  information  considered 
proprietary  is: 


If  in  the  course  of  this  project  review,  I  do  acquire  or  have 
access  to  any  information,  data,  or  material  which  is  business 
confidential,  proprietary,  or  otherwise  privileged,  and  is  so 
indicated  in  writing,  I  agree  that  such  information  will  not  be 
divulged  to  any  person  or  any  organization  or  utilized  for  my  own 
private  purposes  or  in  any  manner  whatsoever,  other  than  in  the 
performance  of  this  project  review: 

1.  without  the  prior  written  permission  of  the  disclosing 
party  or  the_  contracting  officer  for  the  work  being  evaluated,  or 

2.  until  such  information,  data,  or  material  is  first 
publicly  disseminated  by  the  DOE  or  its  contractor  or  grantee 
performing  the  work,  or 

3.  is  or  becomes  known  to  the  public  from  a  source  other 
than  me,  or 

4 .  is  already  known  to  me  or  my  employer  as  shown  by  prior 
records,  whichever  event  shall  first  occur. 


(Signature) 


(Name) 


Printed  or  Typed 


(Date)  DOE," 
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Attachment  12  -  EVALUATION  FORMS  FOR  EXISTING  PROGRAMS  -  LONG  FORM 

TITLE  OF  PROGRAM . 

REVIEWER  NAME . 


lA.  RESEARCH  MERIT  (CIRCLE  ONE  NUMBER  OR  -) 

***LOW**  ***fair***  ***average****  ****good****  **high** 


IB.  RESEARCH  APPROACH/  PLAN/  FOCUS/  COORDINATION 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 


1C.  MATCH  BETWEEN  RESOURCES  AND  OBJECTIVES 

1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***L0W**  ***FAIR***  ***AVERAGE****  ****GOOD****  **HIGH** 


ID.  QUALITY  OF  RESEARCH  PERFORMERS 

1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 


IE.  PROBABILITY  OF  ACHIEVING  RESEARCH  OBJECTIVES 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 


IF.  PROGRAM  PRODUCTIVITY 

***LOW**  ***FAIR***  ***AVERAGE****  ****GOOD****  **HIGH** 


2A.  POTENTIAL  IMPACT  ON  MISSION  NEEDS  (RES/  TECH/  OPERATIONS) 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10  ^ 

***LOW**  ***fair***  ***average****  ****good****  **high** 


2B.  PROBABILITY  OF  ACHIEVING  POTENTIAL  IMPACT  ON  MISSION  NEEDS 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***Low**  ***fair***  ***average****  ****good****  **high** 


2C.  POTENTIAL  FOR  TRANSITION  OR  UTILITY 

1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***L0W**  ***FAIR***  ***AVERAGE****  ****GOOD****  **HIGH** 


2D.  PHASE  OF  R&D  (DOD  TERMINOLOGY) 

6.1 - 6.2 - 6.3 

BASIC  RES**  *APPLIED  RES**  **EXPLORATORY  DEV.*  *ADV  DEV* 


3.  REVIEWER'S  EXPERTISE  IN  THE  RESEARCH  AREA  OF  THIS  PROGRAM 

1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***FAIR***  ***AVERAGE****  ****GOOD****  **HIGH** 


4.  OVERALL  PROGRAM  EVALUATION 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 
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Attachment  13  -  EVALUATION  CRITERIA  FOR  EXISTING  PROGRAMS 

SCORING  CRITERIA 

The  evaluation  form  contains  factors  generally  related  to  research 
and  naval  relevance  issues.  The  scoring  bands  for  all  criteria  except 
2D  are  identical,  and  are:  1-2  (LOW);  2.5-4  (FAIR);  4. 5-6. 5 
(AVERAGE);  7-8.5  (GOOD);  9-10  (HIGH).  Criterion  2D  has  its  own 
scoring  range  defined. 

DEFINITIONS  OF  CRITERIA  ON  PROGRAM  EVALUATION  FORM 

lA.  RESEARCH  MERIT  -  Importance  to  the  advancement  of  science  of 
thequestion  or  problem  addressed  by  the  program.  Consider  the 
technical  objectives,  potential  advancement  of  state-of-art ,  and 
uniqueness  of contribution. 

IB.  RESEARCH  APPROACH/  PLAN/  FOCUS/  COORDINATION  -  Quality  of 
process  employed  to  solve  the  research  problem,  including  the  quality 
and  focus  of  the  research  plan,  definition  of  research  milestones, 
degree  of innovation,  understanding  of  field,  balance  between 
experiment  and  theory,  and  coordination  with  (or  cognizance  of)  other 
related  programs  to  minimize  duplication  or  gaps. 

IC.  MATCH  BETWEEN  RESOURCES  AND  OBJECTIVES  -  Relationship 
between  scientific  objectives  proposed  and  total  resources  requested. 
Also,  adequacy  of  resources  at  performer  level  to  ensure  'critical 
mass'  for  each  performing  unit. 

ID.  QUALITY  OF  RESEARCH  PERFORMERS  -  Consider  publications, 
honors,  and  awards,  relevant  experience,  and  other  less  tangible 
factors  which  contribute  to  team  quality. 

IE.  PROBABILITY  OF  ACHIEVING  RESEARCH  OBJECTIVES  -  Probability 
that  the  program's  research  objectives  will  be  achieved. 

IF.  PROGRAM  PRODUCTIVITY  -  Volume  and  quality  of  work  produced 
and  relationship  of  this  output  to  the  resources  available,  costs 
incurred,  and  time  elapsed  since  program  initiation. 

2A.  POTENTIAL  IMPACT  ON  MISSION  NEEDS  -  Potential  impact  of 
this  program  on  mission  research/  technology/  operational  needs  if 
successful . 

2B.  PROBABILITY  OF  ACHIEVING  POTENTIAL  IMPACT  ON  MISSION  NEEDS 
-  Probability  that  the  program  will  achieve  its  potential  mission 
impact  assuming  that  its  research  objectives  have  been  met. 

2C.  POTENTIAL  FOR  TRANSITION  OR  UTILITY  -  Probability  that 
results  from  this  program  will  be  transitioned  to  or  utilized  by 
technical  community  assuming  that  its  research  objectives  have  been 
met . 

2D.  PHASE  OF  R&D  -  Level  of  program  development.  Scale  ranges 
from  basic  research  (6.1)  through  exploratory  development  (6.2)  to 
advanced  development  (6.3). 

4.  OVERALL  PROGRAM  EVALUATION  -  Single  number  description  of 
overall  program  quality  based  on  all  relevant  criteria.  Provide 
detailed  narrative  of  pros  and  cons  and  any  recommendations  under 
COMMENTS . 
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Attachment  14  -  EVALUATION  FORMS  FOR  PROPOSED  PROGRAMS  -  LONG  FORM 


TITLE  OF  PROPOSED  PROGRAM 
REVIEWER  NAME . 


lA.  RESEARCH  MERIT  (CIRCLE  ONE  NUMBER  OR  -) 

1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 


IB.  RESEARCH  APPROACH/  PLAN/  FOCUS/  COORDINATION 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 


1C.  MATCH  BETWEEN  RESOURCES  AND  OBJECTIVES 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 


ID.  BALANCE  BETWEEN  EXPERIMENT  AND  THEORY 

1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***FAIR***  ***AVERAGE****  ****GOOD****  **HIGH** 


IE.  PROBABILITY  OF  ACHIEVING  RESEARCH  OBJECTIVES 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 


2A.  MISSION  NEED  (PROB  OR  NEED  WHICH  THIS  RESEARCH  ADDRESSES) 


2B.  POTENTIAL  IMPACT  ON  MISSION  NEEDS  (RES/  TECH/OPERATIONS) 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10  _ 

***LOW**  ***fair***  ***average****  ****good****  **high**  " 


2C.  PROBABILITY  OF  ACHIEVING  POTENTIAL  IMPACT  ON  MISSION  NEEDS 
***LOW**  ***FAIR***  ***average****  ****good****  **high** 


2D.  POTENTIAL  FOR  TRANSITION  OR  UTILITY 

***LOW**  ***FAIR***  ***AVERAGE****  ****goOD****  **HIGH** 


2E.  PHASE  OF  R&D  (DOD  TERMINOLOGY) 

6.1 - 6.2 - 6.3 

BASIC  RES**  *APPLIED  RES**  **EXPLORATORY  DEV.*  *ADV  DEV* 


3.  REVIEWER'S  EXPERTISE  IN  THE  RESEARCH  AREA  OF  THIS  PROGRAM 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 


4.  OVERALL  PROGRAM  EVALUATION 
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 

***LOW**  ***fair***  ***average****  ****good****  **high** 
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Attachment  15  -  EVALUATION  CRITERIA  FOR  PROPOSED  PROGRAMS 

SCORING  CRITERIA 

The  evaluation  form  contains  factors  generally  related  to 
research  and  mission  relevance  issues.  The  scoring  bands  for  all 
criteria  except  2A  and  2D  are  identical,  and  are:  1-2  (LOW);  2.5-4 
(FAIR);  4. 5-6. 5  (AVERAGE);  7-8.5  (GOOD);  9-10  (HIGH).  Criterion  2A 
has  no  scoring  range,  and  criterion  2E  has  its  own  scoring  range 
defined. 

DEFINITIONS  OF  CRITERIA  ON  PROPOSED  PROGRAM  EVALUATION  FORM 

lA.  RESEARCH  MERIT  -  Importance  to  the  advancement  of  science 
of  the  question  or  problem  addressed  by  the  program.  Consider  the 
technical  objectives,  potential  advancement  of  state-of-art,  and 
uniqueness  of  contribution. 

IB.  RESEARCH  APPROACH/  PLAN/  FOCUS/  COORDINATION  -  Quality  of 
process  employed  to  solve  the  research  problem,  including  the  quality 
and  focus  of  the  research  plan,  definition  of  research  milestones, 
degree  of  innovation,  understanding  of  field,  and  coordination  with 
(or  cognizance  of)  other  related  programs  to  minimize  duplication  or 
gaps . 

IC.  MATCH  BETWEEN  RESOURCES  AND  OBJECTIVES  -  Relationship 
between  scientific  objectives  proposed  and  total  resources  requested. 

ID.  BALANCE  BETWEEN  EXPERIMENT  AND  THEORY  -  Balance  between 
experiment  and  theory  proposed  relative  to  optimum  required  to 
achieve  performance  targets. 

IE.  PROBABILITY  OF  ACHIEVING  RESEARCH  OBJECTIVES  -  Probability 
that  the  program's  research  objectives  will  be  achieved. 

2A.  MISSION  NEED  -  Identify  the  mission  need  or  problem 
(operational,  technological,  research)  to  which  this  research 
relates . 

2B.  POTENTIAL  IMPACT  ON  MISSION  NEEDS  -  Potential  impact  of 
this  program  on  mission  research/  technology/  operational  needs  if 
successful . 

2C.  PROBABILITY  OF  ACHIEVING  POTENTIAL  IMPACT  ON  MISSION  NEEDS 
-  Probability  that  the  program  will  achieve  its  potential  mission 
impact  assuming  that  its  research  objectives  have  been  met. 

2D.  POTENTIAL  FOR  TRANSITION  OR  UTILITY  -  Probability  that 
results  from  this  program  will  be  transitioned  to  or  utilized  by 
technical  community  assuming  that  its  research  objectives  have  been 
met. 

2E.  PHASE  OF  R&D  -  Level  of  program  development.  Scale  ranges 
from  basic  research  (6.1)  through  exploratory  development  (6.2)  to 
advanced  development  (6.3). 

4.  OVERALL  PROGRAM  EVALUATION  -  Single  number  description  of 
overall  program  quality  based  on  all  relevant  criteria.  Provide 
detailednarrative  of  pros  and  cons  and  any  recommendations  under 
COMMENTS . 
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Attachment  16  -  IDENTIFYING  KEY  REVIEWER  CRITERIA 


Background 

During  the  1980s,  a  competitive  process  among  all  of  ONR's 
claimants  was  used  to  select  new  Accelerated  Research  Initiatives 
(ARIs) .  In  the  mid  to  late  1980s,  panels  of  experts  external  to  ONR 
were  used  to  evaluate  these  proposed  ARIs  (Research  Options  -  ROs) . 
From  1986-1990,  105  ROs  were  evaluated,  and  the  factors  which  the 
reviewers  evaluated  and  scored  for  each  RO  remained  essentially  the 
same.  In  1990,  the  following  analysis  was  made  of  the  reviewers' 
scores . 


Purpose 

1.  It  was  decided  to  analyze  the  patterns  of  the  scores  of  these 
105  ROs.  This  analysis  would  have  the  following  benefits: 

2 .  Future  ROs  could  be  improved  through  the  feedback  of  observed 
trends  and  patterns  to  the  proposers 

3.  The  evaluation  questionnaire  could  be  simplified  if  some  of 
the  factors  proved  to  be  unimportant  in  determining  the  final  score 

4.  The  review  process  could  be  altered  if  different  factors  were 
important  for  different  claimants  or  for  different  technical  areas 

5.  The  development  categories  (early  6.1  [6.1  is  DOD  terminology 
for  basic  research],  late  6.1,  etc.)  of  different  claimants'  ROs 
could  be  checked  against  the  claimants '  charters  to  determine  whether 
these  charters  were  being  followed 

Overview  of  Contents 

The  present  document  contains  an  analysis  of  the  panel  reviewers' 
scores.  Categorizations  of  the  data  base  are  made  to  allow 
parametric  studies.  The  first  section  of  this  report  contains 
regressions  and  correlations  of  the  scoring  factors  as  a  function  of 
claimant,  winners/losers,  technical  discipline,  single/multi,  size, 
and  Phase  of  R&D  (development  category) .  The  purpose  of  this  first 
section  is  to  identify  which  factors  were  important  to  the  reviewers 
in  determining  their  final  score  for  each  RO,  and  whether  these  key 
factors  change  for  different  parametric  values.  The  second  section  of 
this  report  contains  plots  of  dollars  vs  Phase  of  R&D,  as  a  function 
of  claimant,  POM  year,  technical  discipline,  RO  size,  number  of 
claimants  proposing  the  RO,  and  winners/  losers.  The  third  section 
of  this  report  contains  plots  of  dollars  vs  Overall  Program  Score 
(OPE  -  the  reviewers'  bottom  line  score),  as  a  function  of  the  same 
parameters  as  above. 

1.  REGRESSION  ANALYSIS  RESULTS 

The  factors  from  the  reviewers'  questionnaires  which  are  used  in 
the  regression  analyses  are:  Research  Merit  (RM) ;  Research  Approach 
(RA) ;  Match  Between  Resources  and  Objectives  (MBRO) ;  Balance  Between 
Experiment  and  Theory  (BBET) ;  Potential  Impact  on  Naval  Needs  (PINN) ; 
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Potential  for  Transition  or  Utility  (PTU) ;  Overall  Program  Evaluation 
score  (OPE)  ;  and  Phase  of  R&D  (in  DOD  terminology,  research  and 
development  category) .  For  the  main  regression  analysis,  fifteen 
different  parametric  variations  were  made  with  the  seven  factors  PM, 
RA,  MBRO,  BEET,  PINN,  PTU,  OPE,  and  one  run  was  made  to  show 
intercorrelations  among  these  seven  evaluation  factors  for  the  total 
data  base.  The  same  type  of  analysis  was  performed  in  each  of  the 
fifteen  runs. 

First,  a  six  factor  model  was  obtained  from  the  multiple 
regression  analysis  to  predict  OPE:  (OPE=bO+bl*RM+b2*RA+b3*MBRO 
+b4*BBET+b5*PINN+b6*PTU) .  The  three  independent  variables  (xl,  x2 , 
x3)  with  the  highest  regression  coefficients  (bl,  b2 ,  b3)  were  then 
used  in  a  three  factor  model  (OPE=bO+bl*xl+b2*x2+b3*x3 ) ,  and  the 
resultant  R-Squared  values  (R-Squared  represents  the  fraction  of  the 
total  variability  removed  by  the  regression)  were  compared  to 
determine  the  effectiveness  of  a  three  factor  model  relative  to  a  six 
factor  model.  After  the  highest  R-Squared  three  factor  model  was 
run,  the  independent  variables  (xl,  x2)  with  the  two  highest 
regression  coefficients  (bl,  b2)  were  used  in  a  two  factor  model 
(OPE=bO+bl*xl+b2*x2 ) .  The  process  was  repeated  again  going  to  a  one 
factor  model  (OPE=bO+bl*xl) . 

In  addition  to  the  fifteen  cases  mentioned  above,  seven  other 
regressions  were  run.  OPE  score  was  regressed  against  RO  size  (where 
size  is  the  amount  of  funds  requested  for  the  RO ' s  first  year)  for 
all  ONR,  CRP  (an  ONR  unit  at  the  time) ,  and  non-CRP;  and  OPE  score 
was  regressed  against  Phase  of  R&D  for  all  ONR,  CRP,  and  non-CRP. 
CRP  Physical  Sciences  ROs  were  analyzed  similarly  to  the  fifteen 
cases  above. 

The  results  of  the  first  fifteen  cases  are  summarized  in  Table 
1  below.  Starting  from  the  left-hand  side,  the  first  column 
describes  the  subdivision  of  the  total  RO  data  base  to  which  the 
regression  applies.  The  second  column  contains  the  value  of  R- 
Squared  for  the  six  factor  model.  The  third,  fourth,  and  fifth 
columns  contain  the  three  evaluation  factors  which  produce  the 
highest  value  of  R-Squared  of  any  three  factor  model.  These  three 
factors  always  had  the  highest  regression  coefficients  in  the  six 
factor  model,  and  these  factors  are  shown  from  left  to  right  in  order 
of  descending  magnitude  of  their  regression  coefficients.  The  sixth 
column  contains  the  value  of  R-Squared  for  the  model  which  consists 
of  the  factors  contained  in  the  previous  three  columns.  The  seventh 
and  eighth  columns  contain  the  two  evaluation  factors  which  produce 
the  highest  value  of  R-Squared  of  any  two  factor  model. '  These  two 
factors  are  shown  from  left  to  right  in  order  of  descending  magnitude 
of  their  regression  coefficients.  The  ninth  column  contains  the 
value  of  R-Squared  for  the  model  which  consists  of  the  factors 
contained  in  the  previous  two  columns.  The  tenth  column  contains  the 
evaluation  factor  which  produced  the  highest  value  of  R-Squared  of 
any  one  factor  model.  The  eleventh  column  contains  the  value  of  R- 
Squared  for  this  one  factor  model. 
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TABLE  1 


SUMMARY  OF  REGRESSION  RESULTS 

1 . 2 . 3 . 4 . 5 _ 6. ...7. ...8. ...9. ...10. ...11 

. 6 . 3 . 2 . 1 

. FAC . FAC . FAC . FAC 

. MOD . MOD . MOD . MOD 

CASE . R^2 . .  FACTORS  .  .  .R^2  .  .  FACTORS  .  .  .R^2  .  .  .FACT.  .  .R^2 


ALL  ONR . 903.  ..RM...  .PTU.  ..RA - 901.  .RM..  .PTU.  .871.  .RM..  .783 

ALL 

WINNING . 866...RM....RA _ PTU.  .  .863.  .RM.  .  .PTU.  .824. .RM.. .703 

ALL 

LOSING . 775.  .  .PTU.  .  .RM _ RA _ 768.  .RM.  .  .PTU.  .741.  .RM.  .  .561 


PHYS  SCI . 899. ..RM... .BBET. .RA _ 888..RM...RA...869..RM...779 

ENV  SCI . 914.  .  .RM _ MBRO.  .PTU.  .  .904.  .RM.  .  .MBR0.8  97.  .RM.  .  .84  0 

ENG 

SCI . .971.  .  .PTU.  ..RM _ RA - 960.  .PTU.  .RM...953..RM...729 

LIFE 

SCI . 962.  .  .RM _ PTU.  .  .RA _ 93  6.  .RM.  .  .PTU.  .  919  .  .  RM.  .  .  824 

CRP . 892...RM....RA _ PTU.  ..889..RM...RA...865..RM..  .777 

NRL . 885..  .BBET.  .RM _ RA - 874.  .BBET.RM.  .  .860.  .BBET.  774 

NON-CRP . 915. . .RM _ PTU. . .BBET. .904. .RM. . .PTU. .891. .RM. . .782 

SINGLE 

CLAIM . 899.  .  .RM _ PTU.  .  .RA _ 897.  .RM.  .  .PTU.  .870.  .RM.  .  .766 

MULTI 

CLAIM . 975.  .  .RM _ MBRO.  .PTU.  .  .955.  .RM.  .  .MBRO.  954.  .RM.  .  .920 

CRP  SING 

CL . 874...RM....RA _ PTU.  .  .  873  .  .RM.  .  .RA.  .  .  829  .  .RM.  .  .  709 

NRL  SING 

CL . 885.  .  .RM _ BBET.  .RA - 873.  .BBET.RM.  .  .859.  .BBET.  770 

NON-CRP  SING 

CL . 910.  .  .RM _ PTU.  .  .BBET.  .898.  .RM.  .  .PTU.  .885.  .RM.  .  .776 


a.  General  Results 

In  all  cases  examined,  with  the  exception  of  losing  ROs,  the 
values  of  R-Squared  range  from  about  0.85  to  0.95  for  a  six  factor 
model.  Since  an  R-Squared  value  of  1.0  means  the  regression  model 
precisely  Explains  the  data  set,  the  above  results  mean  that  the 
factors  selected  in  the  ONR  evaluation  capture  the  main 
considerations  used  by  the  reviewers  to  determine  their  OPE  scores. 

In  all  cases  examined,  the  values  of  R-Squared  for  a  three 
factor  model  are  within  3%  of  the  values  of  R-Squared  for  a  six 
factor  model,  and  usually  within  1%.  These  three  factor  models 
consist  of  RM,  RA  or  one  of  its  surrogates  (MBRO,  BBET,  which  used 
to  be  included  under  RA) ,  and  except  in  the  Physical  Sciences  RO 
case,  PTU. 

In  all  cases  examined,  the  values  of  R-Squared  for  a  two 
factor  model  are  within  4%  of  the  values  of  R-Squared  for  a  three 
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factor  model,  and  usually  within  2%.  These  two  factor  models 
consist  of  RM,  and  either  PTU,  or  RA  or  one  of  its  surrogates.  In 
all  cases,  the  drop  in  the  value  of  R-Squared  in  going  from  a  two 
factor  model  to  a  one  factor  model  ranges  from  0.04  to  about  0.2, 
usually  averaging  about  0.1.  The  one  factor  models  consist  of  RM, 
with  the  exception  of  BEET  for  NRL. 

The  relatively  small  gradients  in  the  magnitude  of  the  value 
of  R-Squared  in  going  from  a  six  factor  model  to  a  two  factor  model 
implies  that  the  reviewers  used  two,  and  sometimes  three,  main 
factors  in  deciding  the  worth  of  a  proposal.  The  choice  of  factors 
differed  for  claimants,  technical  areas,  etc.,  but  the  number  of 
key  factors  always  remained  small. 

b.  Key  Specific  Results 

For  the  CRP,  research  considerations  (RM,  RA)  predominate  in 
determining  OPE,  while  for  the  non-CRP,  mission  relevance 
considerations  (PTU)  play  a  secondary  but  non-negligible  role 
relative  to  RM  in  determining  OPE.  This  implies  that,  to  some 
extent,  the  reviewers  are  applying  weightings  to  different  factors 
which  go  beyond  the  technical  discipline  under  consideration  and 
depend  on  the  proposing  organization 

For  NRL,  BBET  plays  the  primary  role  in  determining  OPE,  and 
RM  plays  a  secondary  but  non-negligible  role  in  determining  OPE 
In  the  regressions  of  OPE  against  RO  size,  no  correlations 
were  observed.  Thus,  OPE  score  is  independent  of  RO  size. 

In  the  regressions  of  OPE  score  against  Phase  of  R&D,  no 
correlations  were  observed  (R-Squared  approximately  zero) .  The 
conclusion  is  that  OPE  score  is  independent  of  Phase  of  R&D. 

2.  PHASE  OF  R&D  ANALYSIS  RESULTS 


The  Phase  of  R&D  factor  reflects  the  reviewers'  judgement  as 
to  where  an  RO  lies  along  the  6.1  -  6.2  -  6.3  spectrum.  A  picture 
of  how  all  ONR  ROs,  or  subdivisions  thereof,  are  distributed  across 
this  spectrum  is  valuable  for  understanding  whether  ONR  claimants 
are  following  their  charters  relative  to  basic/  applied  research, 
and  for  gaining  general  insight  into  the  program.  Forty  nine 
separate  cases  were  analyzed,  and  the  results  are  presented  as 
histograms  (distributions  by  discrete  bands)  of  ROs'  first  year 
dollars  across  the  different  phases  of  R&D. 

The  results  for  the  first  level  ONR  categorizations'  are 
summarized  in  Figures  2 -A  to  G.  These  figures  contain 
distributions  (by  discrete  bands)  of  Research  Options'  first  year 
dollars  across  the  different  phases  of  R&D  for  different  parameter 
combinations.  On  all  of  these  figures,  the  top  band  represents  the 
first  year  dollar  value  of  Research  Options  whose  panel-averaged 
Phase  of  R&D  scores  placed  them  in  the  earliest  stages  of  basic 
research.  The  next  to  the  top  band  contains  ROs  judged  to  be  in 
the  intermediate  stages  of  basic  research.  Within  the  band  which 
bounds  basic  and  applied  research  (labeled  basic/appl) ,  the 
specific  programs  above  the  midpoint  of  the  band  are  counted  as 
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basic  research  and  those  below  are  counted  as  applied  research.  As 
the  bands  proceed  further  downward,  the  research  becomes  more 
applied. 


ALL  ONR  ANALYSIS -FIGURE  2 -A 


VERY  BASIC . :xxxxxxxxx 

BASIC . :  xxxxxxxxxxxxxxxxxxxx 

BASIC/APPL . :xxxxxxxxxxx 

APPLIED . :  xxxx 

VERY  APPL . :x 


. 0 . 20 

. $M 

For  ALL  ONR,  the  distribution  is  reflective  of  a  mission- 
oriented  basic  research  program,  with  the  highest  dollar  amplitude 
in  the  middle  of  the  basic  research  region,  and  a  modest  dollar 
amplitude  at  the  upper  and  lower  bounds  of  the  basic  research 
region.  About  84%  of  the  total  RO  funds  are  in  basic  research,  and 
the  remainder  are  in  applied  research.  Since  the  ONR  annual 
guidance  to  the  claimants  suggests  a  basic/  applied  research  split 
of  about  80%  basic  and  20%  applied,  it  can  be  inferred  that  the 
claimants  are  indeed  following  the  guidance  for  the  present  case. 

. CLAIMANT  ANALYSIS-FIGURE  2-B 

. CRP . NRL 


VERY  BASIC.  ..: xxxxxxxxxxx . : 

BASIC . :  XXXXXXXXXXXXXXXXXXXX . :  xxxxxxxxxxxxxxxxx 

BASIC/APPL.  .  .  rxxxxxxxxx . :  xxxxxxxxxxxxxxxx 

APPLIED . :  X . :  xxxxxxxxxxxxxx 

VERY  APPL.  .  .  .  : . :xxx 

. 0 . 50 . 0 . 6 


ARP 


SMALL . CLAIMANTS 


VERY. BASIC.  .  .  :xxxxx . rxxxxxxxxx 

BASIC . :  xxxxxxxx . :  xxxxxxxxx 

BASIC/APPL.  .  .  :  xxxxxxxxxxxxxx . :  xxxxxxxxxxxxxicxxxxxx 

APPLIED . :  xxxxxxxxxxxxxxxxxxxx . :  xxxxxxxxxxxx 

VERY  APPL.  .  .  .  rxxxxxxxxx . rxxx 


0 . 4 . 0 . 3 

. $M . $M 


The  CRP's  distribution  is  centered  in  the  basic  research 
region,  while  NRL's  distribution  is  centered  on  the  basic/  applied 
research  boundary.  Since  NRL  is  a  full  spectrum  R&D  laboratory, 
the  researchers  would  probably  be  intermixed  with,  or  may  also  be 
working  in,  the  higher  category  levels  of  development.  The  more 
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applied  flavor  of  the  proposed  NRL  research  relative  to  that  of  the 
CRP  may  be  a  reflection  of  the  closer  ties  of  the  NRL  researchers 
to  the  ongoing  NRL  development  work,  and  would  also  be  reflective 
of  more  definable  transition  paths  for  the  research. 

Compared  to  the  CRP  and  NRL,  the  ARP's  (an  applied  research 
unit  within  ONR)  distribution  is  distinctly  different,  peaking  near 
the  center  of  the  applied  research  region.  In  particular,  the  CRP 
and  ARP  distributions  appear  to  form  a  complementary  set, 
overlapping  at  the  basic/applied  research  boundary.  This  is  a 
heartening  result,  for  it  reflects  the  separate  but  tandem  missions 
established  for  these  two  organizations.  It  shows  further  that  the 
ARP  has  been  able  to  sustain  the  precarious  position  of  remaining 
centered  within  the  applied  research  region  without  drifting  into 
exploratory  development. 

.  .' . TIME  TREND  ANALYSIS -FIGURE  2-C 

. POM.  87 . POM.  88 


VERY.  BASIC.  .  .  :xxxxxxxxxx . :xxxxx 

BASIC . :  xxxxxxxxxxxxxxxxxxxx . :  xxxxxxxxxxxxxxx 

BASIC/APPL.  .  .  :xxxxxxxxxx . :xxxxxx 

APPLIED . :  xxxx . : 

VERY.APPL _ : . :x 

. 0 . 15 . 0 . 22 

. $M . $M 


POM. 89 


POM. 90 


VERY.  BASIC.  .  .  :xxxxxx . :  xxxxxxxxxxxxxxxxxx 

BASIC . .'XXXXXXXXXXXXXXXXXXXX . :  xxxxxxxxxxxxx 

BASIC/APPL.  .  .  : xxxxxxxxxxxxxxxxxx . : xxxxxxxxxxxxx 

APPLIED . ;xxxxxxxxxxx . :xxxx 

VERY.APPL _ : . :xx 

. 0 . 13 . 0 . 13 

. $M . $M 

When  POM  year  is  varied,  there  do  not  appear  to  be  any  time 
monotonic  trends  discernible 
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The  ONR  Physical  Science  ROs  are  concentrated  mainly  in  the 
basic  research  region,  with  a  very  modest  amount  tapering  off  into 
the  applied  research  region.  The  Environmental  Sciences  ROs  appear 
to.  have  a  deficiency  in  the  center  of  the  basic  research  region. 

One  partial  explanation  results  from  the  following  observations 
over  the  past  five  ROMs.  The  Ocean  Sciences/Atmospheric  Sciences 
components  of  Environmental  Sciences  tend  to  be  fairly  fundamental 
in  nature,  and  many  of  them  would  fit  in  the  top  band.  However, 
many  Acoustics  ROs  have  been  guite  sizable,  and  tend  to  be  more  in 
the  direction  of  applied  research.  These  would  probably  populate 
the  band  on  the  boundary  of  basic/applied  research. 

The  ONR  Engineering  Sciences  ROs  have  an  absence  of  dollars  in 
the  most  fundamental  research  band,  which  also  correlates  with 
observations  over  the  past  five  ROMs.  The  remainder  of  the 
Engineering  Sciences  distribution  parallels  that  of  the  Physical 
Sciences  ROs  very  closely.  The  Life  Sciences  RO  distribution 
appears  almost  totally  concentrated  in  the  middle  of  the  basic 
research  region. 


. SIZE  ANALYSIS-FIGURE  2-E 
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By  arbitrary  definition,  large  ROs  have  first  year  funding 
greater  than  $1  million,  and  small  ROs  have  first  year  funding  less 
than  or  equal  to  $1  million.  While  the  distribution  for  small  ROs 
is  broader  than  the  distribution  of  large  ROs,  there  appears  to  be 
little  difference  in  Phase  of  R&D,  for  the  distribution  means, 
between  the  large  and  small  ROs  for  all  ONR,  for  the  CRP,  and  for 
the  non-CRP. 
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SINGLE  VS  MULTI-OT.ATMANT  ANALYSIS-FIGURE  2-F 
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The  ONR  single  and  multi-claimant  distributions  appear  to  have 
about  the  same  means.  The  bands  on  both  extremes  of  the  single 
claimant  distribution  are  either  reduced  or  eliminated  on  the  multi 
claimant  distribution.  Personal  observations  over  the  past  five 
POMs  lead  to  the  conclusion  that  the  addition  of  claimants  to  an  RO 
proposal  tends  to  have  the  effect  of  adding  'filters',  with 
extremes  being  eliminated.  Further,  because  of  the  diversities  in 
Phase  of  R&D  contributed  by  each  of  the  claimants,  and  the 
requirement  that  each  RO  be  given  only  one  score  for  this  factor, 
there  tends  to  be  an  averaging  by  the  reviewers,  a  diffusive 
process  which  has  the  effect  of  'trimming  the  wings'  of  the  factor 
distribution. 


. WINNERS  VS  LOSERS  ANALYSIS -FIGURE  2-G 
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*Phase  of  R&D  score  appears  to  have  no  discernable  impact  on 
whether  an  RO  will  win  or  lose,  for  ONR  as  a  whole,  or  for  the  CRP. 
Phase  of  R&D  may  have  a  slight  influence  on  whether  a  non-CRP  RO 
will  win  or  lose,  but  this  may  be  due  to  some  other  factor  which  is 
highly  correlated  with  Phase  of  R&D. 


3.  Overall  Program  Evaluation  Score  Analysis 

OPE  is  the  factor  which  has  the  strongest  influence  on  the 
final  RO  score.  Study  of  the  distribution  of  dollars  among  the  OPE 
scoring  bands  for  all  ONR  ROs,  or  subdivisions  thereof,  can 
identify  strengths  or  weaknesses  in  various  components  of  the 
program.  Forty  nine  separate  cases  were  analyzed,  and  the  results 
are  presented  as  histograms  (distributions  by  discrete  bands)  of 
ROs'  first  year  dollars  across  the  different  OPE  scoring  bands. 
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The  results  for  first  level  ONR  categorizations  are  summarized 
in  Figures  3-A  to  G.  These  figures  contain  distributions  (by- 
discrete  bands)  of  Research  Options'  first  year  dollars  as  a 
function  of  Overall  Program  Score  for  different  parameter 
combinations.  On  all  of  these  figures,  the  top  band  represents  the 
first  year  dollar  value  of  Research  Op-tions  whose  panel  consensus 
Overall  Program  Evaluation  Scores  placed  these  ROs  in  the  Fair- 
Average  category.  The  next  band  to  the  top  can  be  viewed  as 
Average-Good;  the  next  band  below  can  be  viewed  as  Good-Very  Good; 
and  the  bottom  band  can  be  viewed  as  High  or  Outstanding. 

. ALL  ONR  ANALYSIS-FIGURE  3-A 
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For  all  ONR  proposed  ROs,  the  bulk  are  in  the  Good  -  Very  Good 
range,  which  corroborates  personal  observation  over  the  past  five 
POMs.  The  proposed  ROs  which  come  from  the  claimants  for  the 
overall  competition  typically  have  not  been  reviewed  formally  by 
expert  external  panels.  It  is  conjectured  that  a  rigorous  pre¬ 
review  by  external  expert  panels  convened  by  the  claimants  would 
filter  out  the  Fair-rated  and  most  of  the  Average-rated  ROs. 

CLAIMANT  ANALYSIS-FIGURE  3-B 
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The  CRP  distribution  is  very  similar  to  that  of  the  total  ONR, 
with  the  exception  that  there  are  slightly  less  dollar  fractions  in 
the  two  lower  score  bands.  The  major  differences  between  the  CRP 
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and  NRL  distributions  seem  to  be  that  the  CRP  has  a  higher  dollar 
fraction  in  the  Outstanding  band  and  the  NRL  has  a  somewhat  higher 
dollar  fraction  in  the  Average-Good  band. 

. TIME  TREND  ANALYSTS -FIGURE  3-C 

. POM.  87 . POM.  88 


FAIR/ AVER . :xx . : 

AVER/ GOOD . :  xxxxx . :  xxxxxxx 

GOOD/VERYGOOD. . . . : xxxxxxxxxxxxxxxxxxxx . : xxxxxxxxxxxxxxxxxx 

HIGH . rxxxxxxxxxx . :xx 

. 0 . 18 . 0 . 25 

. $M . $M 


POM. 89 


POM. 90 


FAIR/ AVER . :x . :xxxx 

AVER/GOOD . ;  xxxxxxxxx . :  xxxxx 

GOOD/VERYGOOD . ; XXXXXXXXXXXXXXXXXXXX . : xxxxxxxxxxxxxxxxxxx 

HIGH . :  xxxxx . :  xxxx 

. 0 . 20 . 0 . 20 

. $M . $M 

*There  do  not  seem  to  be  any  major  observable  trends  with 
time,  and  the  main  common  feature  among  the  different  POM  year 
results  is  that  the  highest  proportion  of  ROs  are  scored  in  the 
Good-Very  Good  band.  Unfortunately,  no  method  appears  to  have  been 
discovered  for  eliminating  proposals  in  the  Fair-Aver  band  or 
improving  the  overall  average  quality  of  a  POM  year's  proposals. 

. TECHNICAL  DISCIPLINE  ANALYSIS-FIGURE  3-D 
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ENGINEERING.  SCIENCE . LIFE.  SCIENCE 
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ONR  Physical  Sciences  and  Life  Sciences  distributions  are 
quite  similar.  Relative  to  these  two  distributions,  the 
Environmental  Sciences  distribution  has  a  greater  dollar  fraction 
in  the  Average-Good  band  (the  other  three  bands  having  about  the 
same  dollar  fraction)  and  the  Life  Sciences  distribution  has  a 
greater  dollar  fraction  in  the  Outstanding  band. 

The  OPE  scores  presented  here  are  actual  non-normal ized  panel 
consensus  scores.  Each  of  the  technical  areas  discussed  here  was 
nominally  evaluated  by  one  or  more  expert  panels.  Thus, 
differences  in  distributions  and  mean  scores  among  panels  could  be 
due  to  differences  in  quality  of  the  proposals,  or  could  be  due  to 
differences  in  how  reviewers  interpret  the  definitions  of  the 
scoring  bands.  There  has  been  a  normalization  done  on  panel  scores 
for  the  past  three  POM  years.  In  the  normalization,  it  is  assumed 
that  half  the  difference  between  any  two  panels'  mean  scores  is  due 
to  a  quality  difference  in  the  proposals,  and  the  other  half  of  the 
difference  is  due  to  the  relative  severity  of  the  panelists  in 
assigning  scores.  It  is  the  normalized  scores  which  determine  the 
final  scores  and  prioritizations  of  the  proposals.  However, 
personal  observations  and  informal  'shadow'  reviews  over  the  past 
five  POMs  confirm  the  findings  of  the  distributions  in  this 
section.  Most  notably,  the  Life  Science  ROs  tend  to  have  a  few 
more  Outstanding  contributors  than  those  of  the  other  disciplines, 
and  the  Environmental  Science  ROs  tend  to  have  more  of  a 
contribution  of  Average  members. 
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*The  large  ROs  seem  to  score  slightly  higher  than  the  small 
ROs.  However,  this  may  be  due  to  the  arbitrary  choice  of  a 
dividing  line  between  large  and  small.  In  the  regression  section 
of  this  report,  OPE  was  correlated  with  RO  size,  with  no  arbitrary 
dividing  lines  present,  and  OPE  score  was  shown  to  be  independent 
of  RO  size. 


SINGLE  VS  MULTI -CLAIMANT  ANALYSIS -FIGURE  3-F 
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The  distributions  of  ONR  single  and  multi  claimancy  are  quite 
similar,  and  the  means  appear  about  the  same.  The  CRP  single  and 
multi  claimancy  distributions  are  very  similar.  While  the  non-CRP 
multiclaimant  ROs  have  a  higher  fraction  of  Outstanding/Very  Good 
dollars,  they  also  have  a  higher  fraction  of  Average/Very  Good 
dollars.  There  appears  to  be  no  major  difference  between  the  two 
distributions.  The  CRP  single  claimant  distribution  has  a  smaller 
dollar  fraction  in  the  lower  bands,  and  a  larger  dollar  fraction  in 
the  higher  bands,  than  the  non-CRP  single  claimant  distribution. 

The  same  holds  true  for  the  CRP  multiclaimant  distribution  relative 
to  the  non-CRP  multiclaimant  distribution.  Since  the  CRP  is 
essentially  a  partner  to  all  multiclaimant  ROs  (with  a  few 
exceptions) ,  if  it  had  the  same  share  of  all  multiclaimant  ROs,  the 
CRP  and  non-CRP  multiclaimant  distributions  would  be  identical. 

The  fact  that  the  CRP  distribution  reflects  higher  scores  than  the 
non-CRP  distribution  means  that  the  multiclaimant  ROs  with  higher 
CRP  contribution  score  higher  than  those  with  lower  contribution. 

. WINNERS  VS  LOSERS  ANALYSIS-FIGURE  3-G 
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*The  bulk  of  the  winning  ONR  ROs  are  in  the  Good  range  or  higher; 
the  bulk  of  the  losing  ROs  are  below  the  Good  range,  and  there  is 
some  overlap.  It  should  be  noted  that  the  next  to  the  bottom  band 
contains  ROs  whose  OPE  scores  range  from  7.0  to  8.5.  Personal 
observations  over  the  past  five  POMs  lead  to  the  conclusion  that 
there  is  a  substantial  difference  between  ROs  at  the  upper  end  of 
this  range  and  at  the  lower  end.  Most  of  the  losing  ROs  in  this 
range  scored  at  the  lower  end.  There  is  a  small  fraction  of 
winners  in  the  Average-Good  band.  These  are  un-normalized  scores; 
some  of  the  final  scores  were  increased  due  to  the  normalization 
procedure.  Also,  in  different  POM  years,  the  threshhold  values  for 
funding  ROs  differed. 
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Attachment  17  -  TECHNICAL/PROGRAMMATIC  ISSUES  FOR  PROGRAM  REVIEW 
A)  TECHNICAL  ISSUES 

1.  FOR  EACH  COMPONENT  OF  THE  APPLIED  RESEARCH  PROGRAM,  ADDRESS  THE 
FOLLOWING : 

a.  WHAT  ARE  THE  TECHNICAL  OBJECTIVES? 

b.  WHAT  ARE  THE  KEY  TECHNICAL  ROADBLOCKS  TO  BE  OVERCOME 

c.  WHY  WAS  THE  PARTICULAR  TECHNICAL  APPROACH  CHOSEN? 

d.  WHAT  IS  THE  FEASIBILITY  OF  THE  TECHNICAL  APPROACH  FOR 
ACHIEVING  THE  TECHNICAL  OBJECTIVES? 

e.  IDENTIFY  THE  PROGRESS  AND  ACCOMPLISHMENTS  MADE  TOWARD 
ACHIEVING  THE  OBJECTIVES. 

f.  IDENTIFY  THE  RISK  IN  ACHIEVING  THE  OBJECTIVES. 

g.  WHAT  ARE  THE  PROJECTED  CAPABILITIES  THE  COMPONENT  WILL 
PROVIDE  AND  HOW  WILL  THEY  CONTRIBUTE  TO  THE  TOTAL  PROGRAM;  HOW  DO 
THESE  CAPABILITIES  COMPARE  WITH  THE  STATE-OF-THE-ART  AND  WITH 
POTENTIAL  CAPABILITIES  OF  OTHER  TECHNICAL  APPROACHES? 

h.  WHAT  MORE  FUNDAMENTAL  RESEARCH  RESULTS  ARE  UTILIZED  TO 
INSURE  SUCCESS  OF  THE  PROGRAM?  IF  NEEDED  FUNDAMENTAL  RESEARCH 
INFORMATION  IS  NOT  AVAILABLE,  WHAT  FALLBACK  POSITIONS  EXIST? 

2.  IF  THE  PROGRAM  OBJECTIVES  ARE  ACHIEVED,  WHAT  IS  THE  PROBABILITY 
THAT  THE  INDIVIDUAL  COMPONENTS  AND/OR  THE  TOTAL  PROGRAM  ARE 
TRANS ITIONABLE.  WHAT  IS  THE  EVIDENCE  TO  SUPPORT  YOUR  RESPONSE. 

3.  WHAT  IS  THE  LOGICAL  STRUCTURE  AND  PROGRESSION  OF  THE  TEST 
PROGRAM?  WHAT  VALIDATIONS  WILL  BE  ACHIEVED  FROM  EACH  STEP  OF  THE 
TEST  PROGRAM,  INCLUDING  LAB  TESTS  AND  FIELD  TESTS? 

4 .  WHAT  IS  THE  TECHNICAL  FOCUS  OF  THE  TOTAL  PROGRAM?  HOW  ARE 
DISCRETE  COMPONENTS  BEING  INTEGRATED  INTO  A  UNIFIED  PROGRAM? 

5.  WHAT  IS  THE  BALANCE  BETWEEN  RESOURCES  AND  TECHNICAL  OBJECTIVES? 
IS  THE  TOTAL  PROGRAM  SUFFICIENTLY  FOCUSED  FOR  THE  RESOURCES,  OR  IS 
IT  TOO  DILUTED  AMONG  THE  DIFFERENT  COMPONENTS? 


B)  PROGRAMMATIC  ISSUES 

1.  WHAT  IS"  THE  MANAGEMENT  AND  WORK  BREAKDOWN  STRUCTURE  OF  THE 
PROGRAM? 

2 .  WHAT  ARE  THE  MILESTONES  TO  ACHIEVE  THE  PROGRAM  OBJECTIVES ;  WHAT 
WILL  BE  DEMONSTRATED,  AND  WHEN? 

3.  WHAT  ARE  THE  CRITICAL  PATHS,  AND  HOW  COULD  THEY  IMPACT  THE 
SCHEDULE? 

4.  FUNDING  DISTRIBUTION  BY  TASK  AND  PERFORMER  FOR  EACH  YEAR. 
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5.  CHANGES  IN  SCOPE  FROM  ORIGINAL  PLANS,  AND  RATIONALE  SUPPORTING 
THESE  CHANGES. 

6.  PROGRAM  SHORTFALLS  TO  DATE,  IMPACT  ON  OVERALL  GOALS,  AND  PLANS 
FOR  MITIGATION 

7.  PROGRAM  COORDINATION  WITH  OTHER  AGENCIES  AND  WITH  INDUSTRY,  BOTH 
DOMESTIC  AND  FOREIGN. 

8.  HOW  WOULD  THE  PROGRAM  BE  AFFECTED  IF  THE  MONEY  WERE  SPREAD  OVER 
FOUR  YEARS  INSTEAD  OF  THREE  YEARS;  TWO  YEARS  INSTEAD  OF  THREE 
YEARS;  HOW  WOULD  THIS  AFFECT  RISK? 


EVALUATION  CRITERIA  FOR  APPLIED  RESEARCH  PROGRAM  REVIEW 
I)  TECHNICAL  CRITERIA 

PROVIDE  COMMENTS  ON  THE  TECHNICAL  ISSUES  IDENTIFIED  ABOVE  AND  ANY 
OTHER  TECHNICAL  ISSUES  WHICH  YOU  FEEL  ARE  RELEVANT  TO  THIS  PROGRAM. 
ADDRESS  STRENGTHS  AND  WEAKNESSES,  AND  INCLUDE  RECOMMENDATIONS  FOR 
IMPROVING  THE  PROGRAM. 


II)  PROGRAMMATIC  CRITERIA 

PROVIDE  COMMENTS  ON  THE  PROGRAMMATIC  ISSUES  IDENTIFIED  ABOVE  AND  ANY 
OTHER  PROGRAMMATIC  ISSUES  WHICH  YOU  FEEL  ARE  RELEVANT  TO  THIS 
PROGRAM.  ADDRESS  STRENGTHS  AND  WEAKNESSES,  AND  INCLUDE 
RECOMMENDATIONS  FOR  IMPROVING  THE  PROGRAM. 


ALTERNATIVE  APPLIED  RESEARCH  PROGRAM  EVALUATION  FORM 
REVIEWER'S  NAME _ 

1.  IS  THE  INVESTMENT  STRATEGY  APPROPRIATE  FOR  AN  APPLIED  XXXXXXXXXX 
RESEARCH  PROGRAM?  WAS  THE  PRIORITIZATION  AND  ALLOCATION  OF  RESOURCES 
AMONG  RESEARCH  COMPONENTS  SUPPORTED  BY  A  LOGICAL  RATIONALE?  IS  THERE 
AN  APPROPRIATE  BALANCE  BETWEEN  REQUIREMENTS -DRIVEN  (TOP-DOWN)  AND 
OPPORTUNITIES-DRIVEN  (BOTTOM-UP)  APPLIED  RESEARCH  IN  THE  PROGRAM? 
HOW  CAN  VERTICAL  INTEGRATION  WITHIN  THE  PROGRAM  BE  IMPROVED? 

2.  FOR  EACH  RESEARCH  COMPONENT  OF  THE  XXXXXXXXXXXX  RESEARCH  PROGRAM, 
ADDRESS  THE  FOLLOWING: 

2a.  ARE  THE  TECHNICAL  OBJECTIVES  CLEAR  AND  RELATED  TO  THOSE  OF 
THE  TOTAL  PROGRAM? 

2b.  ARE  THE  KEY  TECHNICAL  ROADBLOCKS  TO  BE  OVERCOME  IDENTIFIED? 
2c.  IS  THE  PARTICULAR  TECHNICAL  APPROACH  CHOSEN  APPROPRIATE? 
2d.  IS  THE  TECHNICAL  APPROACH  FOR  ACHIEVING  THE  TECHNICAL 
OBJECTIVES  FEASIBLE? 
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2e.  ARE  THE  PROGRESS  AND  ACCOMPLISHMENTS  MADE  TOWARD  ACHIEVING 
THE  OBJECTIVES  ACCEPTABLE? 

2f.  ARE  THE  RESEARCH  TECHNICAL  QUALITY  AND.  PRODUCTIVITY 
SUFFICIENT? 

2g.  IS  THE  RISK  APPROPRIATE  IN  ACHIEVING  THE  OBJECTIVES. 

2h.  ARE  THE  PROJECTED  CAPABILITIES  THE  COMPONENT  WILL  PROVIDE 
AND  CONTRIBUTE  TO  THE  TOTAL  PROGRAM  ADEQUATE;  HOW  DO  THESE 
CAPABILITIES  COMPARE  WITH  THE  STATE-OF-THE-ART  AND  WITH  POTENTIAL 
CAPABILITIES  OF  OTHER  TECHNICAL  APPROACHES? 

3.  IF  THE  PROGRAM  OBJECTIVES  ARE  ACHIEVED,  WHAT  IS  THE  PROBABILITY 
THAT  THE  INDIVIDUAL  COMPONENTS  AND/OR  THE  TOTAL  PROGRAM  ARE 
TRANSITIONABLE?  WHAT  IS  THE  EVIDENCE  TO  SUPPORT  YOUR  RESPONSE? 

4.  WHAT  IS  THE  TECHNICAL  FOCUS  OF  THE  TOTAL  PROGRAM?  HOW  ARE 
DISCRETE  COMPONENTS  BEING  INTEGRATED  INTO  A  UNIFIED  PROGRAM? 

5.  WHAT  IS  THE  BALANCE  BETWEEN  RESOURCES  AND  TECHNICAL  OBJECTIVES? 
IS  THE  TOTAL  PROGRAM  SUFFICIENTLY  FOCUSED  FOR  THE  RESOURCES,  OR  IS  IT 
TOO  DILUTED  AMONG  THE  DIFFERENT  COMPONENTS?  IS  THERE  AN  APPROPRIATE 
BALANCE  AMONG  ANALYSIS,  THEORY,  COMPUTER  MODELING,  LAB  TESTING,  FIELD 
TESTING,  AND  HARDWARE  DEVELOPMENT? 

6.  IS  THE  PROGRAM  COORDINATION  WITH  OTHER  FEDERAL  AND  STATE  AGENCIES 
AND  INDUSTRY  (AND  FOREIGN,  IF  APPLICABLE)  ADEQUATE?  IS  THERE 
SUFFICIENT  LEVERAGING  OF  THESE  LARGER  EXTERNAL  PROGRAMS? 


PROVIDE  COMMENTS  ON  THE  TECHNICAL  ISSUES  IDENTIFIED  ABOVE  AND  ANY 
OTHER  TECHNICAL  ISSUES  WHICH  YOU  FEEL  ARE  RELEVANT  TO  THIS  PROGRAM. 
ADDRESS  STRENGTHS  AND  WEAKNESSES,  AND  INCLUDE  RECOMMENDATIONS  FOR 
IMPROVING  THE  PROGRAM. 
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Attachment  18  -  REVIEW  PROTOCOL  FOR  SMALL  SEED  MONEY  PROJECTS 

Many  organizations  have  special  programs  which  consist  of  small, 
high  risk,  finite  duration  projects.  These  programs  have  a  variety 
of  names,  such  as  seed  money  or  independent  research.  They  may  have 
a  variety  of  purposes,  such  as  attracting  high  level  staff, 
maintaining  staff  technical  competency,  maintaining  awareness  of  the 
cutting  edge  external  R&D  community,  and  identifying  future 
investment  areas  for  the  organization.  Because  of  these  projects' 
small  size  and  high  risk  nature,  high  intensity  assessments  during 
their  lifetimes  may  be  counterproductive.  The  remainder  of  this 
section  describes  a  protocol  for  evaluating  these  projects  at  the 
completion  of  their  execution  phase.  The  protocol  combines  the  best 
of  several  different  agencies'  review  practices  of  small  projects, 
and  recommends  inclusion  of  some  unique  features. 

For  purposes  of  this  discussion,  it  is  assumed  that  the  central 
evaluation  mode  is  panel  peer  review.  The  underlying  review 
philosophy  is  that  it  is  neither  cost-effective  or  necessary  for  each 
project  to  be  presented  in  its  entirety  before  the  panel,  as  would  be 
the  case  with  larger  sized  projects.  If  the  main  purpose  of  the 
program  is  to  help  the  organization  position  itself  for  the  future  in 
cutting  edge  science  and  technology,  then  the  project  presentations 
need  contain  only  that  threshold  amount  of  information  which  will 
describe  the  investment  strategy  that  leads  to  the  stated 
organizational  goal.  However,  since  Lotka ' s  Law  states  that  only  a 
small  percentage  of  research  projects  will  have  substantial  payoff, 
and  assessment  studies  have  shown  that  organizations  need  to  have 
these  few  'heavy-hitters'  to  maintain  vigor  and  viability,  a  few 
expanded  presentations  of  the  best  projects  will  be  required  to 
determine  whether  the  organization  has  its  share  of  high  payoff 
potential  research  projects. 

For  most  of  the  projects  presented,  two  or  three  vugraphs  of 
material  would  be  sufficient.  These  viewgraphs  should  contain  very 
short  statements  of  the  research  objectives,  the  technical  approach, 
the  potential  payoff  to  the  organization  (relevance  to  the 
organization's  mission),  results  obtained,  research  products 
generated  (paper  and  patent  references,  etc.),  and  coordination  with 
other  organizations  (relation  to  complementary  work  in  other 
organizations) .  Total  presentation  time  for  each  of  these  projects 
should  not  exceed  three  or  four  minutes.  The  best  of  the  projects 
would  have  presentation  time  expanded  to  about  15  minutes  per 
project,  would  have  more  focus  on  results  and  transition 
possibilities,  and  would  be  subject  to  more  detailed  scrutiny  by  the 
review  panel.  Review  forms  presented  in  some  of  the  previous 
attachments  could  be  utilized  for  this  review. 

In  order  for  this  abbreviated  presentation  approach  to  be 
effective,  the  panel  has  to  receive  descriptive  material  about  all 
the  projects  beforehand.  These  writeups  would  be  about  two  to  five 
pages  in  length,  and  would  contain  the  supporting  details  of  the 
items  summarized  on  the  vugraphs.  Thus,  the  panel  members  would 
enter  the  review  with  some  understanding  about  the  technical  details, 
and  could  focus  on  project  linkages  and  investment  strategy  during 
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the  review.  The  next  attachment  recommends  other  information  which 
could  help  streamline  the  review  further. 

Consider  the  following  example.  Assume  a  lab  has  .a  $3M  per  year 
program  consisting  of  60  seed  money  projects,  and  assume  one  third  of 
the  program  is  reviewed  each  year.  Assume  these  projects  can  be 
aggregated  equally  into  four  technical  disciplines,  such  as 
materials,  acoustics,  mechanics,  and  remote  sensing.  The  review 
would  consist  of  the  following.  The  seed  money  program  manager  would 
spend  about  30-45  minutes  overviewing  the  program.  This  would 
include  the  lab's  mission,  and  how  it  relates  to  the  corporate 
sponsor's  mission.  It  would  also  include  the  seed  money  program's 
objectives,  and  how  they  relate  to  the  lab's  mission.  It  would 
describe  selection  and  management  criteria  for  the  projects.  Then, 
after  the  overview,  an  expert  in  each  technical  discipline  would 
present  the  projects  within  that  discipline.  Four  of  the  five 
projects  within  the  discipline  would  require  about  15  minutes  total, 
and  the  fifth  (best)  project  would  require  about  15  minutes  by 
itself.  Thus,  each  discipline  would  require  about  30  minutes  for 
presentation,  and  the  total  review,  including  overview,  would  be 
about  three  hours.  By  the  end  of  the  review,  the  panel  would 
understand  the  program's  objectives,  the  strategy  for  choosing  the 
projects,  the  importance  of  the  projects  to  science  and  the 
organization,  how  the  projects  would  help  position  the  organization 
for  the  future,  and  whether  some  high  quality  results  were  obtained. 

To  close  the  loop,  the  reviewers'  comments  would  be  sent 
anonymously  to  the  program  manager.  The  manager  would  be  required  to 
respond  in  writing  to  the  comments,  including  descriptions  of  actions 
to  be  taken  as  a  result  of  the  critiques.  The  manager's  comments 
would  be  circulated  to  the  reviewers  to  ascertain  their  satisfaction, 
and  a  final  statement  of  satisfaction  or  dissatisfaction  would  be 
sent  by  the  reviewers  to  the  assessment  manager. 
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Attachment  19 
EVALUATION 


USE  OF  PUBLISHED  PAPERS  IN  RESEARCH  PROGRAM 


The  conduct  of  research  project/  program  peer  reviews  in  many 
agencies  appears  designed  more  for  the  comfort  of  the  participants 
rather  than  the  efficient  exchange  of  information.  Especially  in 
panel  reviews,  the  presentation  focus  tends  to  be  on  intricate 
technical  details  rather  than  the  investment  strategy.  The  technical 
details  address  mainly  the  job  right  component  of  peer  review, 
whereas  the  investment  strategy  has  the  focus  of  the  right  i  ob 
component.  Much  of  the  detailed  technical  information  could  be 
supplied  to  the  reviewers  beforehand,  and  the  valuable  but  usually 
quite  limited  presentation  period  could  be  devoted  more  to 
understanding  the  investment  strategy  rationale.  However,  the 
reviewers  and  presenters  (and  usually  the  audience)  tend  to  be 
trained  technically,  are  more  comfortable  in  discussing  technical 
details,  and,  because  of  their  background  expertise  in  the  areas 
being  reviewed,  are  usually  willing  to  accept  the  right  j ob  aspects 
of  the  technical  area  as  fundamentally  important. 

It  is  the  author's  firm  contention  that  as  much  useful 
background  information  as  possible  should  be  supplied  to  the 
reviewers  of  a  research  program/  project  before  the  actual  review 
occurs.  In  addition  to  the  narratives  suggested  previously,  there  is 
another  source  of  valuable  information  that  has  been  almost 
completely  neglected  during  any  of  the  many  different  agency  project/ 
program  reviews  the  author  has  attended.  This  information  is  the 
written  peer  reviews  of  the  project's  papers  that  were  submitted, 
accepted,  and/or  published  by  refereed  journals.  The  following 
discussion  proposes  that  fuller  use  be  made  of  these  journal  peer 
reviews  in  the  research  program  peer  review  process. 

A  published  paper  is  really  not  research,  it  is  a  documentation 
of  research.  However,  while  this  observation  mainly  impacts  the 
importance  ascribed  to  bibliometric  counts  in  assessing  research 
productivity  and  quality,  it  says  little  about  the  intrinsic  value  of 
a  published  paper  for  use  in  research  evaluation.  Because  of  the 
effort  generated  by  authors/  editors/  reviewers  in  the  paper 
publication  process,  there  is  much  information  in  the  paper  and  the 
publication  process  that  could  be  valuable  in  research  program 
evaluation. 

Under  the  present  system  of  manuscript  publishing,  papers  are 
submitted  by  a  researcher (s)  to  a  journal.  The  papers  are  then  sent 
by  the  journal  editor,  or  proxy,  to  one  or  more  experts  in  the  field 
for  review  (typically  two  or  three  experts) .  For  a  technical 
article,  the  author (s)  tends  to  supply  many  details  of  the  technical 
approach,  as  well  as  other  useful  information.  During  the  manuscript 
review,  typically  the  reviewers  spend  substantial  time  addressing  the 
intricate  details  of  the  technical  approach  used  in  the  research  (as 
well  as  addressing  other  criteria) .  The  paper  may  be  accepted  or 
rejected  outright,  or  accepted  pending  approved  revision.  The 
reviewers'  comments,  and  the  submitter's  rebuttal  (if  any)  stay 
within  the  editor-submitter-reviewer  group.  Thus,  if  a  researcher 
has  one  published  paper  during  a  year,  and  this  is  presented  to  a 
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panel  of  experts  as  part  of  a  project/  program  review,  all  the  panel 
knows  is  that  the  paper  passed  the  threshold  requirements  for  a 
particular  journal.  The  panel  does  not  know  how  .many  journals 
rejected  the  article,  what  the  comments  of  the  rejecting  peer 
reviewers  were,  what  the  rebuttal  comments  of  the  submitter  were,  or 
what  the  specific  comments  of  the  accepting  journal  peer  reviewers 
were.  This  information  would  be  very  useful  to  have  during  a 
project/  program  review,  since  it  could  reduce  the  need  for  the 
presentation  of  copious  technical  detail  during  the  review,  and  allow 
more  time  for  discussion  of  higher  order  issues  such  as  investment 
strategy  and  relevance  to  organizational  objectives. 

Since  the  sponsoring  agency  pays  for  the  research,  it  has  every 
right  to  have  full  access  to  reviewers'  comments  on  the  products  of 
the  research.  Otherwise,  the  agency  is  being  excluded  from  external 
reviews  of  research  which  it  has  supported.  The  journal 
reviewers  have  typically  expended  much  effort  in  the  technical  review 
process,  and  the  valuable  information  contained  in  their  comments  is 
not  being  used  for  the  fullest  benefit  to  the  rightful  recipients  of 
this  information,  the  research  sponsors. 

For  a  paper  which  results  from  sponsored  research,  an  agreement 
is  required  between  the  research  sponsoring  agencies/corporations  and 
the  research  journals  that  the  sponsor  of  the  paper's  research  be 
identified  when  it  is  submitted  for  publication.  Once  the  paper  has 
been  reviewed,  a  copy  of  the  journal  reviewers'  comments  would  be 
sent  to  the  sponsoring  organization  as  well  as  to  the  article 
submitter.  In  return  for  the  journal's  efforts,  the  sponsoring 
organization  would  provide  some  financial  compensation  to  the  journal 
for  the  review  and  comments.  Under  this  system,  writers  of  low-to- 
average  quality  articles  would  be  less  motivated  to  submit  randomly 
to  different  journals,  since  the  peer  reviews  would  be  transmitted  to 
their  sponsoring  organizations.  This  would  have  the  positive  effect 
of  reducing  the  overwhelming  volume  of  mediocre  articles  submitted  to 
and  published  in  the  literature.  Also,  these  journal  reviews  would 
be  submitted  to  the  sponsor's  project  evaluation  panels  as  background 
material,  and,  as  stated  above,  would  reduce  the  need  for  detailed 
exposition  of  technical  approach  which  presently  consumes  much  of  the 
presentation  time  of  project  reviews. 

This  approach  would  probably  result  in  a  positive  Darwinian 
selection  process.  The  good  researchers  who  recognize  that  they  are 
doing  good  research  would  be  motivated  to  publish  more,  while  the 
mediocre/ average  researchers  who  recognize  that  they  are  doing  mid¬ 
level  research  would  be  motivated  to  publish  less.  The  differences 
in  numbers  and  quality  of  published  papers  between  the  good 
researchers  and  average  researchers  would  be  accentuated  and  would 
become  more  evident  to  the  review  panel,  and  the  papers  would  then 
have  more  of  an  impact  on  the  panel's  evaluation  of  a  project.  The 
journals  would  be  partially  compensated  for  their  efforts,  and  the 
journal  reviewers  could  conceivably  be  partially  compensated  for 
their  efforts.  This  could  make  journal  reviewing  a  more  attractive 
process  to  reviewers,  and  might  improve  some  of  the  review  quality 
issues  described  in  the  peer  review  Quality  section  of  this  document. 
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Attachment  20  -  FUNDS  ALLOCATIONS  BASED  ON  REVIEWERS'  SCORES 
Background 

One  issue  which  has  not  been  addressed  so  far  is  the  relation  of 
the  results  of  a  research  evaluation  or  impact  assessment  to  the 
impact  on  the  unit  being  evaluated.  In  the  specific  case  of  a 
program  assessment,  there  are  multiple  philosophies  as  to  the 
required  actions  based  on  the  outcome. 

One  school  of  thought  assumes  the  program  to  exist  because  of 
strategic  value  to  the  organization.  If  the  program  fares  poorly  in 
an  evaluation,  the  funding  should  not  suffer  because  of  the  strategic 
value.  In  this  case,  program  management  could  be  changed,  the 
program  could  be  restructured  and  the  portfolio  modified,  but  the 
funding  would  be  preserved. 

Another  view  assumes  that  all  programs  have  strategic  value,  but 
that  the  final  impact  of  the  research  on  the  organization  and  on 
society  is  heavily  dependent  on  the  quality  of  the  research.  If  a 
program  receives  a  poor  evaluation,  it  is  either  reduced  or 
vertically  cut  (terminated)  and  the  resources  are  shifted  to  programs 
which  fared  better  in  the  evaluation. 

The  remainder  of  this  section  presents  an  analytic  approach  for 
shifting  resources  based  on  program  evaluation  scores.  The  approach 
relies  heavily  on  program  quality,  but  is  sufficiently  flexible  to 
take  into  account  the  strategic  value  of  the  program  to  the 
organization.  It  is  also  sufficiently  flexible  to  apply  to  the 
allocation  of  resource  increases  or  decreases  to  programs  based  on 
their  quality  scores  while  recognizing  their  strategic  value  to  the 
organization. 

Introduction 

In  the  early/mid  1980s,  the  author  examined  program  evaluation 
data  from  a  number  of  different  organizations.  In  particular,  the 
author  focused  on  programs  that  were  assessed  through  evaluation  of 
their  component  projects.  In  many  cases,  as  will  be  described 
further,  there  was  a  linear  relationship  between  project  quality  and 
cumulative  funding  for  the  program. 

A  specific  example  will  clarify  the  preceding  statement.  Assume 
a  program  consists  of  ten  projects,  each  having  a  value  of  lOOK. 
Assume  that  a  review  has  been  held,  and  the  projects  received  the 
following  quality  scores:  10,  9.5,  9,  8.5,  8,  7.5,  7,  6.5,  6,  5.5. 
For  conceptual  purposes,  order  the  projects  by  decreasing  scores,  and 
plot  project  score  as  a  function  of  cumulative  project  funding. 
Thus,  the  coordinates  of  the  first  project  are  [10,  lOOK] ,  the 
coordinates  of  the  second  project  are  [9.5,  200K] ,  third  project  [9, 
300K],  and  so  on.  The  graph  will  have  the  following  appearance  as 
shown  by  the  ones  (1)  . 
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Because  of  the  discrete  nature  of  the  numbers,  the  graph  does 
not  contain  the  coordinate  [10,  OK].  The  author's  observations  of 
the  data  were  that  most  programs  contained  this  point,  and  therefore 
the  model  should  contain  this  point.  To  convert  the  graph  to  a 
continuous  function  which  contains  the  point  [10,  OK],  the  graph  is 
shifted  and  replotted  as  shown  by  the  z. 

When  the  project  evaluation  data  is  plotted  as  above,  the 
following  hypothetical  interpretation  can  be  made.  If  funds  are 
removed  from  this  program,  the  lower  quality  projects  will  be 
eliminated,  and  both  the  threshold  quality  of  the  lowest  remaining 
project  and  the  average  program  quality  will  be  raised.  Conversely, 
if  funds  are  added  to  this  program,  lower  quality  projects  will  be 
added,  and  both  the  threshold  quality  of  the  lowest  project  and  the 
average  program  quality  will  be  lowered. 

Assume  there  are  two  programs  which  have  been  evaluated,  where 
one  program  (Ph)  received  a  higher  evaluation  than  the  other  program 
(PI)  .  In  the  above  model,  the  lowest  ranked  projects  in  Ph  will  have 
higher  quality  than  the  lowest  ranked  projects  in  Pi.  If  the 
objective  of  the  reallocation  process  is  to  raise  the  total  quality 
of  Ph  and  Pi  combined,  then  funds  must  be  shifted  from  PI  to  Ph. 
This  will  result  in  the  elimination  of  the  lowest  ranked  projects  in 
PI  and  the  addition  of  more  conceptually  lower  ranked  projects  in  Ph. 
However,  the  new  projects  added  to  Ph  will  be  of  conceptually  higher 
quality  than  those  removed  from  Pi.  By  the  principle  of  marginal 
utility,  the  total  combined  quality  of  Ph  and  Pi  will  be  maximized 
when  the  quality  of  the  lowest  ranked  project  in  each  program  is  the 
same . 

This  is  the  principle  behind  the  reallocation  algorithm  to  be 
developed.  An  example  will  be  shown  for  the  linear  case.  The  figure 
below  is  shown  for  the  more  general  linear  case,  where  $P  represents 
the  total  funding  in  the  program. 


253 


10  .... Z ...  1 


. 9 . Z.  .  .1 

. 8 . z.  .  .1 

. 7 . z.  .  .1 

. 6 . Z...1... 

QUALITY.  .5.  .  .Qm . z 

. .Q($)  . . .4 . 

. 3 . 

. 2 . 

. 1 . 

. 0 . 


. 0.  .100.200.300 . $P 

. CUMULATIVE . FUNDING . 

For  multiple  programs  which  have  been  evaluated,  the  objective 
is  to  maximize  the  sum  of  the  dollar-weighted  guality  of  all  the 
programs.  This  can  be  stated  mathematically  as: 

. $P1' . $P2  ' 

MAX  {INT _ Ql($)d$  +  INT _ Q2($)d$  + . } . (1) 

. 0 . 0 

where  INT  represents  the  integral  and  ranges  from  0  to  $Pi ' ,  Qi  is 
quality  of  program  i  as  a  function  of  funds  $,  and  $Pi'  is  the  new 
total  funding  for  program  i  which  will  maximize  the  integral.  Qm  on 
the  graph  is  the  minimum  value  of  Q,  Qmi,  for  program  i. 

For  the  assumed  linear  relation  between  Q  and  $,  Q  can  be 
written  as 

Qi($)  =  10  +  2*(  (Qiav-10)/($Pi)  )*$ . (2) 

where  Qiav  is  the  funds  weighted  average  of  quality  for  program  i. 

If  equation  2  is  substituted  into  equation  1,  the  maximization 
problem  becomes 

MAX  { (10*$Pl'+( (Qlav-10)/$P1) *$P1'^2)  + 

. (10*$P2  '  +  (  (Q2av-10)/$P2)  *$P2  ’•^2)  + 


. ) . (3) 

subject  to  the  constraint  that 

SUM  {$P1'  +  $P2'  +  ...}  =  $TOT . (4) 


where  $TOT  is  the  total  investment  in  all  the  programs. 

The  following  example  was  run  using  the  nonlinear  programming  package 
bundled  with  Excel  4.0.  Assume  five  programs  have  been  evaluated 
with  the  following  average  program  scores:  9,  8,  7,  6,  5.  Assume 
each  program  had  $10M  in  funds.  The  optimization  routine  yielded  the 
following  results. 
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Thus,  program  1  increased  in  funding  from  $10M  to  $21. 9M  as  a 
result  of  the  evaluation.  Its  minimal  quality  project  had  an  initial 
value  of  8  (Qm) ,  and  after  the  funds  increase  its  minimal  quality 
project  had  a  value  of  5.62  (Qm'),  the  same  value  as  the  minimal 
quality  project  of  the  other  programs.  This  was  predicted  above 
because  of  the  principle  of  marginal  utility.  The  total  funds 
weighted  quality  increased  from  350  to  390.  Obviously,  this  type  of 
result  could  have  been  obtained  manually  if  very  few  programs  were 
evaluated.  For  tens,  or  hundreds,  of  programs  a  computerized  method 
is  essential  for  uniformity  and  consistency. 

The  method  is  not  restricted  to  a  linear  relation  between 
quality  and  funding.  If  it  is  desired  to  emphasize  the  strategic 
nature  of  the  programs  more,  and  reduce  the  impact  of  quality  on  the 
reallocation,  then  a  nonlinear  function  can  be  selected  in  which 
quality  is  a  weaker  function  of  funding.  Given  the  uncertainty  in 
the  accuracy  and  precision  of  reviewers'  scores,  this  type  of 
nonlinear  relationship  would  soften  the  effects  of  any  uncertainties 
on  funding  redistributions.  If  it  is  desired  to  strongly  emphasize 
quality  and  approach  the  vertical  cut  limit,  then  a  nonlinear 
function  can  be  selected  in  which  quality  is  a  stronger  function  of 
funding.  All  that  is  required  is  to  substitute  the  desired  nonlinear 
function  into  equation  2,  and  proceed  as  above. 

The  following  example  incorporates  some  nonlinear  functions. 
Assume  the  following  relationship  between  Q  and  $,  which  is  a  more 


general  version  of  equation  (2) : 

Qi($)  =10-  (a(i)^n)*$ . (5) 

where  a(i)  is  a  coefficient  to  be  determined  by  the  boundary 
conditions.  When  Q  =  Qmi,  $  =  $Pi,  and  it  can  be  shown  that 

a(i)  =  (10-Qmi)/ ( ($Pi)  ^n) . (6) 

Combining  (5)  and  (6)  yields: 

Qi($)  =  10  -  (  (n+1)  *(10-Qiav)/($Pi^n)  )*($-^n) . (7) 

For  n  =  1,  (7)  reduces  to  (2). 

The  maximization  problem  can  now  be  written: 

MAX  { (10*$Pl'+( (Qiav-10)/($Pl^n) ) *($Pl'^(n+l) )  + 

. (10*$P2 '+( (Q2av-10)/ ($P2^n) ) * ( $P2 ' ^ (n+1) )  + 

. ) . (8) 
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subject  to  the  constraint  that 


SUM  { $P1 '  +  $P2  '  +  .  .  .  .  )  =  $TOT . ( 9 ) 

For  n  =  1,  (8)  reduces  to  (3) . 

Now  assume  five  programs  have  been  evaluated  with  the  following 


average 

program 

scores:  9,  8.5, 

8,  7.5,  7. 

Assume  each  program  had 

$10M  in 

funds . 

The  optimization  routine 

yielded 

the  following 

results 

for  n  = 

.5,  1,  2. 

n= .  5 
PROGRAM . 

. . . Qiav. 

. . . $Pi .... $Pi ' . 

... Qm .... Qm ' . 

.TOTQUAL. 

. TOTQUAL ' 

Prog. 1 . . 

. 9. 

....  10 ....  2  5 . 4 . 

. .8.5. . .7.61. 

. 90. 

. 214 

Prog . 2 . . 

- 8.5. 

. . . .10. . . .11.3. 

.  .7.8 - 7.61. 

. 95 

Prog. 3 . . 

. 8. 

....  10 . 6.4. 

. . . .7. . .7.61. 

. 80. 

. 53 

Prog. 4 . . 

- 7.5. 

.  ...  10 . 4.1. 

. .6.3. . .7.61. 

. 75. 

. 34 

Prog . 5 . . 

. 7. 

.  ...  10 . 2.8. 

. .5.5. . .7.61. 

. 70. 

. 24 

SUM=400. .SUM=420 


n=l 

PROGRAM. 

. . . Qiav. 

. . . $Pi. 

.$Pi'  . 

. . .Qm. 

. . .Qm' . .TOTQUAL. 

.  TOTQUAL 

Prog. 1 . . 

_ 10. 

.17.2. 

•  •  •  •  8  • 

..6.55 . 90. 

. 143 

Prog. 2 . . 

- 8.5. 

_ 10. 

. 11.5. 

_ 7. 

..6.55 . 85 . 

. 95 

Prog. 3 . . 

_ 10. 

. .8.6. 

....  6 . 

..6.55 . 80. 

. 71 

Prog. 4 . . 

- 7.5. 

_ 10. 

.  .6.9. 

*  •  •  •  5  • 

• • • • • • •VS* 

. 57 

Prog. 5 . . 

. 7. 

_ 10. 

.  .5.7. 

_ 4. 

..6.55 . 70. 

. 48 

. SUM=400 . 

.SUM=414 

n=2 

PROGRAM. 

. . .Qiav. 

. . . $Pi. 

.$Pi'  . 

. . .Qm. 

. . .Qm' . .TOTQUAL. 

. TOTQUAL 

Prog. 1 . . 

. 9. 

. . . .10. 

.13.4. 

_ 7. 

..4.62 . 90. 

. 110 

Prog. 2 . . 

- 8.5. 

_ 10. 

.10.9. 

. .5.5. 

.  .  4 . 62 . 85  . 

. 90 

Prog. 3 . . 

_ 10. 

. .9.5. 

_ 4. 

..4.62 . 80. 

. 78 

Prog. 4 . . 

- 7.5. 

_ 10. 

. .8.5. 

. .2.5. 

..4.62 . 75. 

. 70 

Prog. 5 . . 

. 7. 

_ 10. 

. .7.7. 

_ 1. 

..4.62 . 70. 

. 63 

. SUM=4  00. 

.SUM=410 

Thus,  for  the  square  root  relationship  (n=.5)  between  quality 
and  project  funds,  substantial  funds  are  shifted  to  the  highest 
scoring  project  because  the  high  quality  drops  off  very  elowly  with 
increasing  addition  of  funds.  This  relationship  is  more  appropriate 
for  a  quality  program  emphasis.  Conversely,  for  the  square 
relationship  (n=2) ,  relatively  few  funds  are  shifted  because  of  the 
assumed  rapid  dropoff  of  quality  with  increased  funds.  This 
relationship  is  more  appropriate  for  a  strategic  program  emphasis. 

The  method  is  also  applicable  to  the  cases  where  funds  are  added 
to  or  subtracted  from  an  organization's  research  budget.  The 
constraint  on  total  funds  is  changed  (equation  4) ,  and  the 
optimization  proceeds  as  before. 
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Attachment  21  -  POTENTIAL  USE  OF  ENTROPY  IN  RESEARCH  EVALUATION 

In  the  assessment  of  research  or  research  impact,,  many  types  of 
distribution  patterns  occur.  There  are  funds  allocations  across 
technical  disciplines,  funds  allocations  across  performers,  funds 
allocations  across  levels  of  development  and  the  other  cross-cuts 
mentioned  in  Attachment  2  on  investment  strategy,  papers  produced  in 
different  disciplines,  papers  co-authored  in  different  disciplines, 
papers  published  in  different  types  of  journals,  citations  by  papers 
in  different  disciplines,  citations  by  people  from  different  types  of 
institutions  and  different  countries,  patents  produced  in  different 
technologies,  patents  cited  by  papers  and  patents  in  different 
disciplines,  etc.  While  these  distributions  are  sometimes  listed  or 
catalogued  during  an  assessment,  they  are  rarely,  if  ever,  subjected 
to,  a  pattern  analysis.  Such  an  analysis  would  offer  a  much  richer 
insight  to  research  impacts  or  management  processes  than  are  offered 
by  the  standard  examination  of  magnitudes  alone.  The  use  of  entropy 
to  characterize  these  distribution  patterns  offers  a  potentially 
substantial  improvement  in  output  interpretation  of  an  assessment. 

In  statistical  mechanics,  the  entropy  is  related  to  the  number 
of  micro-states  (or  states  of  the  system  at  the  atomic  level)  per 
macro-state  (state  of  the  system  at  the  classical  thermodynamic 
level) .  The  statistical  interpretation  of  the  second  law  is  that 
entropy  tends  toward  the  most  probable  state.  The  system  proceeds 
from  a  state  of  order  to  disorder. 

The  information  theory  use  of  entropy  is  related  to  the 
statistical  mechanics  definition.  If  a  system  consists  of  N  total 
units,  and  these  units  are  distributed  among  m  different  states  with 
a  distribution  function  n(i)  ,  then  the  entropy  s  of  the  system  may  be 
written  as: 


. i=m 

. s.  =  .-SUM. .p(i) *ln( (p(i) )  (1) . 

. i=l 

where  SUM  represents  the  summation  over  all  states  i,  and  p(i) 
is  the  ratio  of  n(i)  to  N, 

Thus,  for  any  distribution  n(i),  equation  (1)  allows  the  entropy 
to  be  computed.  The  entropy  can  be  interpreted  as  a  measure  of  the 
order,  or  breadth,  of  the  distribution,  and  its  change  can  be  tracked 
with  time.  It  can  serve  as  a  single  figure  of  merit  for  analyzing 
the  distribution  diversity  of  any  quantity. 

Examples  of  application  of  the  entropy  concept  to  two  of  the 
distribution  patterns  mentioned  above  follow. 

Citations  by  Papers  in  Different  Journals 

One  of  the  measures  of  research  program  impact  is  the  number  of 
citations  of  papers  produced  by  the  program.  The  initial  part  of 
this  Handbook  provides  references  of  some  citation  studies  under  the 
bibliometrics  category  of  the  quantitative  methods  section.  While 
the  number  of  citing  papers  is  very  important,  information  about  the 
citing  papers  can  be  extremely  valuable.  What  is  the  distribution  of 
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citing  papers  among  different  technical  disciplines;  among  different 
journals;  among  different  institutions;  among  different  countries? 
How  can  the  impact  of  the  program  papers  on  the  citing  papers  be 
quantified  relative  to  the  above  and  other  characteristics  of  the 
citing  papers?  The  following  application  of  the  entropy  concept 
provides  a  starting  point  for  the  quantification,  but  it  will  be 
shown  that  additional  measures  are  necessary  for  further  insight  into 
the  impact. 

Assume  that  a  paper  has  received  1000  citations  by  journal 
papers.  Assume  also  that  the  citing  papers  can  be  categorized  by 
journal  quality  (level  1,  level  2,  level  3),  where  each  journal 
quality  category  is  denoted  by  i.  Then  the  entropy  of  the 
distribution  is  the  same  as  that  given  above: 

.  . . i=3 

. s.  =  .-SUM. ..p(i)*ln((p(i) ) /kappa 

. i=l 

where  p(l)  is  the  fraction  of  citing  papers  in  journal  of  level  1 
quality,  p(2)  is  the  fraction  in  level  2,  p(3)  is  the  fraction  in 
level  3,  and  kappa  is  a  constant  which  will  produce  an  entropy  s 
upper  limit  of  unity. 

The  following  table  illustrates  how  the  entropy  function  varies 
with  different  numbers  of  citing  papers  in  the  different  journal 
types . 

LEVEL. 1 _ 998. . 990 . . 900 . . 800 . .700. .600. .500. .400. .333 

LEVEL. 2 . 1... .5. ..50. .100. .150. .200.. 250.. 300.. 333 

LEVEL.  3 . 1 _ 5.  .  .50.  .100.  .150.  .  200  .  .  250  .  .  300  .  .  333 

ENTROPY . 01. . .06. . .36. . .58. . .75. . .87. . .95. . .99. .1.0 

As  all  citing  papers  are  concentrated  into  one  journal  type,  the 
entropy  measure  goes  to  zero,  and  as  the  citing  papers  are  divided 
equally  among  journal  types,  the  measure  goes  to  one.  However,  the 
table  illustrates  the  limitations  of  using  the  entropy  measure  alone. 
If  the  paper  had  received  2000  citations  distributed  among  the 
journal  types  in  the  same  ratio,  the  entropy  measure  would  have  been 
the  same.  Clearly  the  total  impact  would  not  be  reflected  in  the 
entropy  measure  as  used  here.  This  effect  could  be  overcome  by  using 
the  analogy  with  entropy  in  classical  thermodynamic  systems.  The 
entropy  measure  above  could  be  defined  as  an  entropy  per  unit,  and 
then  multiplied  by  the  total  number  of  units  in  the  system  to  get 
total  entropy.  However,  the  measure  would  now  be  substantially 
greater  than  unity  in  the  full  disorder  limit,  could  be  subject  to 
more  misinterpretation,  and  the  measure  would  lose  its  utility. 

To  measure  impact  of  the  original  paper  on  the  citing  papers, 
other  measures  will  be  employed  in  addition  to  the  entropy  function. 
These  other  measures  are  the  moments  Mj  of  the  citing  paper 
distribution  function  n(i) .  The  jth  moment  Mj  of  the  distribution 
function  n(i)  is  defined  as: 
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. i=iti 

Mj .=.SUM. . (i^j) *n(i) 

. i=l 

where  n(i)  is  the  number  of  citing  papers  in  journal  type  i. 

To  show  why  using  the  moments  of  the  distribution  function  is 
useful,  and  to  aid  in  the  interpretation  of  what  follows,  an  analogue 
of  the  citing  process  to  a  nuclear  interaction  process  is  provided. 
For  example,  if  a  high  energy  proton  interacts  with  a  natural  uranium 
target,  neutrons  will  be  released  from  the  uranium  by  spallation, 
evaporation,  and  fast  fission  [Kostoff,  1979].  These  released 
neutrons  will  have  a  wide  range  of  velocities,  which  can  be 
characterized  by  a  velocity  distribution  function.  The  released 
neutrons  can  also  interact  with  other  targets  and  have  additional 
neutron  multiplication  effects,  depending  on  the  energy  of  the 
incoming  neutron  and  the  composition  of  the  target.  With  the  use  of 
kinetic  theory  (collisionless  for  large  mean  free  path  neutrons) , 
moments  of  the  released  neutron  velocity  distribution  function  can  be 
used  to  obtain  macro-state  information  about  the  released  neutron 
stream. 

The  citing  process  has  some  analogues  to  the  neutron  production 
process  described  above.  The  original  published  paper  is  analogous 
to  the  high  energy  proton.  The  technical  community  that  reads  the 
published  paper  is  analogous  to  the  natural  uranium  target.  The 
citing  papers  produced  by  the  technical  community  are  analogous  to 
the  neutrons  produced.  The  quality  of  the  journals  in  which  the 
citing  papers  are  published  is  analogous  to  the  velocities  of  the 
different  neutrons. 

The  zeroth  moment  of  the  citing  paper  distribution  function  is: 

. i=m 

MO.=.SUM. .n(i) 

. i=l 

In  analogy  to  kinetic  theory,  where  the  zeroth  moment  of  the 
particle  velocity  distribution  is  the  mass  density,  the  zeroth  moment 
of  the  citing  paper  distribution  shown  above  is  the  number  of  citing 
papers,  or  the  citing  paper  mass. 

The  first  moment  of  the  distribution  function  is: 

. i=m 

M1.=.SUM. . i*n(i) 

. i=l 

In  analogy  to  kinetic  theory,  where  the  first  moment  of  the 
particle  velocity  distribution  is  the  momentum  (mass*velocity)  of  the 
particle  stream,  the  first  moment  of  the  citing  paper  distribution  is 
the  citing  paper  impact. 

The  second  moment  of  the  distribution  function  is: 
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. i=iii 

M2  .  =  .SUM.  .  (i'^2)  *n(i) 

. i=l 

In  analogy  to  kinetic  theory,  where  the  second  moment  of  the 
particle  velocity  distribution  is  the  energy  (mass*velocity^2)  of  the 
particle  stream,  the  second  moment  of  the  citing  paper  distribution 
is  the  citing  paper  energy. 

The  third  moment  of  the  distribution  function  is: 

. i=m 

M3 .=.SUM. . (i^3) *n(i) 

. i=l 

In  analogy  to  kinetic  theory,  where  the  third  moment  of  the 
particle  velocity  distribution  is  the  flux  of  particle  energy 
(mass*velocity^3 ) ,  the  third  moment  of  the  citing  paper  distribution 
is  the  citing  paper  energy  flux. 

Thus,  sole  use  of  the  zeroth  moment  of  the  citing  paper  journal 
type  distribution  provides  a  very  gross  measure  of  the  impact  (the 
number  of  citing  papers)  but  offers  little  information  about  the 
quality  of  the  impact.  In  this  particular  example,  information  about 
the  types  of  user  audience  is  at  least  as  important  as  nuinbers  of 
users.  Is  the  author  of  the  original  paper  reaching  the  intended 
audience?  Use  of  the  entropy  of  the  citing  paper  journal  type 
distribution  shows  the  diversity  of  the  user  audience. 

Use  of  the  first  moment  allows  the  importance  assigned  to  the 
different  journal  types  to  be  factored  in  the  analysis.  To  compute 
the  first  moment,  journal  type  i  has  to  be  assigned  a  numerical  value 
which  reflects  its  importance.  In  analogy  to  kinetic  theory,  this 
numerical  value  is  the  effective  "velocity"  of  journal  type  i...  With 
use  of  this  effective  velocity,  computation  of  the  first  moment 
yields  the  momentum,  or  total  citing  paper  impact.  In  analogy  to 
kinetic  theory,  the  ratio  of  the  first  moment  to  the  zeroth  moment  is 
the  citing  paper  "average  velocity",  or  average  impact/citing  paper. 

Use  of  the  second  moment  accentuates  the  difference  in 
importance  of  the  various  journals.  For  distributions  which  have 
similar  values  of  total  impact,  use  of  the  "energy"  will  identify 
which  of  those  distributions  rely  on  "velocity"  more  than  "mass"  for 
their  impact.  For  distributions  which  have  similar  values  of  total 
impact  and  energy,  and  where  more  differentiation  is  required,  third 
or  higher  moments  can  be  employed.  The  following  example  illustrates 
this  point.  In  this  example,  two  citing  paper  journal  distributions, 
A  and  B,  were  compared  for  a  domain  of  six  journals  of  different 
quality.  The  distributions  were  selected  such  that  the  entropy  and 
zeroth,  first,  and  second  moments  were  equal.  The  computational 
results  follow. 

.  {  .  .  .1 . 2 . 3 . 4 . 5 . 6.  .  .  } — NUMBER.  OF.  JOURNAL 

..n(3)..n(4)..n(5)..n(6)..n(7)..n(8)..s . MO. ...Ml . M2 . M3 

A.  200. ..100. ..200. ..100. ..300.. .100. .. 95 .. 1000 .. 5500 .. 33100 .. 212500 

B.  .92.  .  .269.  .  .218.  .  .112 _ 86. ..223. ..95.  .1000.  .  5500  .  .  33 100  .  .  2 14815 . 
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The  first  row  represents  the  six  journals.  The  first  six 
columns  of  the  second  row  represent  the  citing  paper  distribution 
function  for  the  six  journals.  The  number  in  parentheses  is  the 
value  of  quality  (effective  velocity)  assigned  to  each  of  the  six 
journals.  Thus,  the  entry  in  the  first  column  of  the  second  row, 
n(3)  ,  is  interpreted  as  the  number  of  citing  papers  in  journal  1, 
where  journal  1  has  a  quality  value  of  3.  Continuing  on  the  second 
row,  s  is  the  entropy  of  the  citing  paper  journal  distribution,  MO  is 
the  zeroth  moment  of  this  distribution.  Ml  is  the  first  moment,  M2  is 
the  second  moment,  and  M3  is  the  third  moment.  Rows  three  and  four 
are  the  values  of  these  columns  for  cases  A  and  B. 

All  of  the  figures  of  merit  are  the  same  for  the  two  cases 
except  the  third  moment  M3 .  While  two  cases  with  so  many  equal 
figures  of  merit  would  be  an  extremely  rare  occurrence,  the  example 
does  show  the  discriminatory  capability  of  the  moment  approach.  In 
this  case,  use  of  even  higher  moments  would  provide  more  separation 
between  the  numerical  results,  and  allow  more  insight  for  the 
interpretation  of  the  results. 

To  track  the  figures  of  merit  through  time,  and  extract  useful 
information,  analogies  can  be  made  with  aerodynamics  trajectory 
analysis.  An  aerodynamic  vehicle's  state  can  be  tracked  through 
space  and  time  to  generate  its  trajectory  (position  in  space  and 
time).  The  first  time  derivative  of  its  trajectory  is  its  velocity, 
the  second  derivative  is  the  acceleration,  and  the  third  derivative 
is  the  agility  (ability  to  move  inertial  forces  rapidly) .  Thus,  the 
entropy  and  the  moments  in  the  above  example  could  be  plotted  as  a 
function  of  time,  and  their  derivatives  obtained.  Valuable 
information  could  be  obtained  from  the  derivatives  to  see  how  the 
impact  of  an  organization's  output  is  changing  over  time,  and  how 
rapidly  shifts  are  occurring,  especially  in  response  to  new 
management  initiatives. 

Funds  Allocations  Across  Disciplines  or  Levels  of  Development 

Quantitative  measures  of  the  degree  of  vertical  or  lateral 
integration  in  an  organization  or  in  a  group  of  programs  would  be 
useful  to  management  for  tracking  purposes.  It  would  also  be  useful 
for  organizational  assessments  in  being  able  to  display  the  status  of 
vertical  or  lateral  integration.  While  quantitative  measures  are 
incomplete  by  themselves,  and  for  the  lateral  or  vertical  integration 
measure  here  do  not  address  the  strength  of  the  linkages  among  the 
different  related  disciplines  or  levels  of  development,  they  do 
provide  a  Starting  point  for  identifying  potential  problem  areas. 

Vertical  or  lateral  integration  within  an  organization  makes  it 
easier  for  multiple  level  of  development  or  discipline  funds  to  be 
managed  jointly  and  at  lower  levels  in  the  organization.  The  degree 
of  multiple  level  of  development  or  discipline  funds  management  by  an 
organizational  unit  is  one  component  of  vertical  or  lateral 
integration. 

The  quantitative  measure  proposed  here  for  ascertaining  the 
funds  mixing  component  of  vertical  or  lateral  integration  is  the 
degree  to  which  different  categories  of  funds  are  managed  jointly  and 
at  the  lower  levels  in  the  organization.  From  this  perspective,  one 
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aspect  of  vertical  or  lateral  integration  can  be  viewed  as  a  process 
by  which  management  of  different  level  of  development  or  discipline 
funds  by  the  same  unit  diffuses  into  the  lower  levels  of  the 
organization. 

The  measure  could  take  different  mathematical  forms.  Some 
desirable  limiting  conditions  include:  1)  for  a  given  amount  of  funds 
managed  by  the  unit  of  interest  (say,  a  Technical  Manager) ,  the 
measure  should  go  to  zero  as  all  funds  are  lumped  into  one  level  of 
development  or  discipline;  2)  the  measure  should  go  to  one  as  the 
funds  are  equally  divided  among  the  levels  of  development  or 
disciplines;  3)  the  measure  should  range  between  zero  and  one  and  be 
smooth  in  this  region. 

Many  mathematical  measures  could  be  defined  which  have  these 
desirable  properties.  Since  the  problem  is  in  essence  a  funds  mixing 
problem,  and  since  there  is  a  precedent  for  using  entropy  as  a 
measure  in  physical  or  chemical  mixing  problems,  the  entropy 
definition  above  will  be  used  as  the  metric  for  assessing  the 
vertical  or  lateral  integration  funds  mixing  component. 

The  following  example  is  for  vertical  integration,  but  with  some 
modifications  could  apply  equally  well  to  lateral  integration. 
Assume  there  are  three  levels  of  funds  to  be  integrated:  basic 
research,  applied  research,  and  development.  Assume  further  that  the 
unit  of  analysis  is  all  programs  under  each  Technical  Manager  in  the 
organization.  Then,  for  each  Technical  Manager,  the  entropy  metric 
for  his  programs  is  given  by  the  information  theory  expression  for 
entropy: 


. i=3 

. s.  =  .-SUM. ..p(i)*ln((p(i) ) /kappa 

. i=l 

where  p(l)  is  the  fraction  of  the  Technical  Manager's  funds  in  basic 
research,  p(2)  is  the  fraction  in  applied  research,  p(3)  is  the 
fraction  in  development,  and  kappa  is  a  constant  which  will  produce 
an  entropy  s  upper  limit  of  unity. 

The  following  table  illustrates  how  the  entropy  function  varies 
with  different  amounts  of  funds  in  the  different  levels  of 
development  in  the  Technical  Manager's  program.  Each  column 
represents  different  distributions  of  a  $1000  total  program. 

BAS. RES. ..999. 999.. 999.. 990.. 9 00.. 800.. 700.. 6 00.. 500.. 4 00.. 333 


APR. RES. ..... 0005. ...5. ...5. ..50. .100. .150. .200. .250. .3 00. .333 

DEVELOP . 0005. ...5. ...5. ..50. .100. .150. .200. .250. .3 00. .333 


ENTROPY . 0... 01. ..06. ..36. ..58. ..75. ..87 _ 95. ..99. .1.0 

As  all  funds  are  concentrated  into  one  level  of  development,  the 
measure  goes  to  zero,  and  as  the  funds  are  divided  equally  among 
levels,  the  measure  goes  to  one. 

The  first  part  of  the  following  discussion  applies  to 
implementing  the  measure  for  tracking  total  organization  performance, 
and  the  second  part  applies  to  implementing  the  measure  for  tracking 
individual  program  performance.  The  measure  would  be  implemented  in 
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the  following  manner  for  the  total  organization.  The  organization's 
management  at  all  levels  would  examine  all  programs  and  decide  how 
the  funds  integration  should  be  structured.  This  is  the  key  step  in 
the  process,  and  requires  that  the  different  modes  by  which  vertical 
integration  will  be  effected  be  defined  and  planned  for 
implementation.  There  may  be  technical  areas  or  Technical  Managers 
where  the  vertical  integration  would  be  effected  through  close 
coordination  and  cooperation  rather  than  funds  mixing.  For  example, 
generic  research  areas  with  multiple  higher  level  of  development 
applications  would  be  one  candidate. 

Once  the  degree  of  desired  funds  mixing  has  been  determined 
within  the  context  of  the  overall  vertical  integration  structure,  the 
measure  chosen  would  be  computed  for  each  program  and  Technical 
Manager.  The  measure  would  be  computed  for  the  existing  degree  of 
funds  mixing  and  for  the  desired  degree  of  funds  mixing  (the  funds 
mixing  target) .  Aggregates  of  the  measure  for  each  Technical 
Manager,  Division,  Office,  etc.,  and  for  the  total  organization  would 
be  computed  and  tracked.  The  actual  measure  levels  would  be  tracked 
against  the  measure  targets,  and  progress  in  achieving  the  targets 
monitored. 

Because  entropy  does  not  define  a  pattern  uniquely,  supplemental 
measures  would  be  of  benefit.  One  such  approach  would  be  to  track 
actual  funds  deviation  from  a  desired  funds  mixing  target.  The 
starting  point  of  this  approach  is  to  define  the  different  level  of 
development  funds  targets  for  each  Technical  Manager.  Then,  the 
square  of  the  difference  between  the  actual  funds  each  Technical 
Manager  has  in  each  level  of  development  at  a  point  in  time  and  the 
target  funds  for  each  level  of  development  for  the  Manager  would  be 
computed  and  tracked.  As  time  proceeds,  this  'residual'  should 
decrease.  Aggregates  of  this  'residual'  over  Division,  Office,  total 
organization  would  be  computed  and  tracked  as  proposed  above  for  the 
entropy  measure.  This  measure  could  be  normalized  in  the  form  of  a 
coefficient  for  easier  interpretation,  or  could  remain  in  the  form  of 
funds . 

The  entropy  measure  would  also  be  useful  for  tracking  programs 
over  time  as  they  pass  through  different  levels  of  development.  Well 
run  programs  would  have  hills  and  valleys  in  the  entropy-time  plot, 
with  smooth  temporal  entropy  gradients.  A  typical  program  would  have 
low  entropy  when  it  is  entirely  in  the  basic  research  phase.  Its 
entropy  would  rise  to  near  unity  as  the  program  transitions  from 
basic  to  applied  research,  and  both  types  of  funds  are  used  to 
finance  the  program.  The  entropy  would  decrease  again  as  the  basic 
research  funds  are  phased  out  and  the  applied  research  funds  become 
dominant.  The  entropy  would  increase  as  applied  research  proceeds 
and  development  funds  are  phased  in.  These  cycles  would  be  repeated 
as  the  development  process  proceeds.  In  the  tracking  of  the  temporal 
entropy  plot,  if  the  entropy  remains  low  during  different  development 
phases,  this  means  that  abrupt  transitions  to  different  phases  are 
occurring.  This  condition  is  less  desirable  than  the  gradual 
transitions  depicted  above,  and  is  readily  observable  from  the 
entropy  trajectory.  Again,  measures  supplemental  to  entropy  could  be 
employed  in  the  tracking  process  to  enhance  the  interpretation  of  the 
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output.  A  quantitative  tracking  approach  as  described  becomes 
especially  useful  when  management  must  track  tens  or  hundreds  of 
programs . 

In  summary,  the  distribution  patterns  which  occur  in  research 
assessments  contain  much  useful  information.  Present  techniques 
extract  relatively  little  of  this  information  in  practice.  Use  of 
concepts  from  thermodynamics  and  other  fields  such  as  entropy, 
momentum,  and  energy  can  improve  the  information  extraction  process, 
and  aid  in  the  interpretation  of  the  results  through  physical 
analogies . 
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VIII.  ANALYSIS  OF  RIA  LITERATURE 


This  section  includes  contributions  from  DR.  RONALD  N.  KOSTOFP, 
OFFICE  OF  NAVAL  RESEARCH;  MR.  HENRY  J.  EBERHART,  NAVAL  AIR  WARFARE 
CENTER  CHINA  LAKE;  MR.  DARREL  R.  TOOTHMAN,  DSTI,  INC.;  DR.  ROBERT 
PELLENBARG,  NAVAL  RESEARCH  LABORATORY. 

INTRODUCTION 

This  section  shows  how  Database  Tomography  can  be  used  to  derive 
technical  intelligence  from  the  published  literature.  Database 
Tomography  is  a  patented  system  for  analyzing  large  amounts  of 
textual  computerized  material.  It  includes  algorithms  for  extracting 
multi-word  phrase  frequencies  and  performing  phrase  proximity 
analyses.  Phrase  frequency  analysis  provides  the  pervasive  themes  of 
a  database,  and  the  phrase  proximity  analysis  provides  the 
relationships  among  the  pervasive  themes,  and  between  the  pervasive 
themes  and  sub-themes. 

One  potential  application  of  Database  Tomography  is  to  obtain 
the  thrusts  and  interrelationships  of  a  technical  field  from  papers 
published  in  the  literature  within  that  field.  This  section  provides 
applications  of  Database  Tomography  to  analyses  of  both  the  non¬ 
technical  field  of  Research  Impact  Assessment  (RIA)  and  the  technical 
field  of  Chemistry. 

A  database  of  relevant  RIA  articles  was  analyzed  to  produce 
characteristics  and  key  features  of  the  RIA  field.  The  recent 
prolific  RIA  authors,  the  journals  prolific  in  RIA  papers,  the 
prolific  institutions  in  RIA,  the  prolific  keywords  specified  by 
the  authors,  and  the  authors  whose  works  are  cited  most 
prolifically  as  well  as  the  particular  papers/  journals/ 
institutions  cited  most  prolifically,  are  identified.  The 
pervasive  themes  of  RIA  are  identified  through  multi-word  phrase 
analyses  of  the  database.  A  phrase  proximity  analysis  of  the 
database  shows  the  relationships  among  the  pervasive  themes,  and 
the  relationships  between  the  pervasive  themes  and  subthemes. 

A  similar  process  was  applied  to  Chemistry,  with  the  exception 
that  the  database  was  limited  to  one  year's  issues  of  the  Journal  of 
the  American  Chemical  Society.  Wherever  possible,  the  RIA  and 
Chemistry  results  were  compared.  Finally,  the  conceptual  use  of 
Database  Tomography  to  help  identify  promising  research  directions 
was  discussed. 


BACKGROUND 

Science  and  technology  are  assuming  an  increasingly  important 
role  in  the  conduct  and  structure  of  domestic  and  foreign  business 
and  government.  In  the  highly  competitive  civilian  and  military 
worlds,  there  has  been  a  concommittent  increase  in  the  need  for 
scientific  and  technical  intelligence  to  insure  that  one's  perceived 
adversaries  do  not  gain  an  overwhelming  advantage  in  the  use  of 
science  and  technology.  While  there  is  no  substitute  for  direct 
human  intelligence  gathering,  there  have  become  available  many 
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techniques  which  can  support  and  complement  direct  human  intelligence 
gathering.  In  particular,  techniques  which  identify,  select,  gather, 
cull,  and  interpret  large  amounts  of  technological  infprmation  semi- 
autonomously  can  expand  greatly  the  capabilities  of  human  beings  for 
performing  technical  intelligence. 

This  section  shows  how  Database  Tomography  [Kostof f , 1993f , 
1994h,  1995e]  can  be  used  to  derive  technical  intelligence  from  the 
published  literature.  As  stated  previously.  Database  Tomography  is 
a  patented  system  for  analyzing  large  amounts  of  textual  computerized 
material.  It  includes  algorithms  for  extracting  multi-word  phrase 
frequency  analysis  and  performing  phrase  proximity  analyses.  The 
phrase  frequency  analysis  provides  the  pervasive  themes  of  a 
database,  and  the  phrase  proximity  analysis  provides  the 
relationships  among  the  pervasive  themes,  and  beteen  the  pervasive 
themes  and  sub-themes. 

One  potential  application  of  Database  Tomography  is  to  obtain 
the  thrusts  and  interrelationships  of  a  technical  field  from  papers 
published  in  the  literature  within  that  field.  This  section 
originated  with  a  benchmark  application  of  Database  Tomography  to 
analysis  of  the  field  of  Research  Impact  Assessment  (RIA) . 

To  execute  the  study  reported  in  this  paper,  a  database  of 
relevant  RIA  articles  is  generated  using  a  unique  search  approach 
(See  Section  IX-A) ,  and  the  database  is  analyzed  to  produce 
characteristics  and  key  features  of  the  RIA  field.  The  recent 
prolific  RIA  authors,  the  journals  prolific  in  RIA  papers,  the 
prolific  institutions  in  RIA,  the  prolific  keywords  specified  by  the 
authors,  and  the  authors  whose  works  are  cited  most  prolifically  as 
well  as  the  particular  papers  cited  most  prolifically,  are 
identified.  In  addition,  the  most  highly  cited  years,  journals,  and 
countries  are  also  shown.  The  pervasive  themes  of  RIA  are  identified 
through  multi-word  phrase  analyses  of  the  database.  A  phrase 
proximity  analysis  of  the  database  shows  the  relationships  among  the 
pervasive  themes,  and  the  relationships  between  the  pervasive  themes 
and  subthemes . 

Based  on  the  positive  benchmark  results  for  RIA,  the  application 
of  Database  Tomography  to  a  technical  field.  Chemistry,  was  then 
performed,  and  the  results  from  the  two  studies  are  compared  where 
practical.  To  execute  the  Chemistry  study,  a  database  of  all  papers 
published  in  the  1994  edition  of  a  leading  Chemistry  journal,  the 
Journal  of  the  American  Chemical  Society  (JACS) ,  as  abstracted  in  the 
Science  Citation  Index  (SCI)  is  generated,  and  the  database  is 
analyzed  to  produce  characteristics  and  key  features  of  the  Chemistry 
field  as  reflected  in  JACS.  The  recent  prolific  JACS  authors,  the 
prolific  institutions  in  JACS,  the  prolific  keywords  specified  by  the 
authors,  and  the  authors  whose  works  are  cited  most  prolifically  as 
well  as  the  particular  papers  cited  most  prolifically,  are 
identified.  In  addition,  the  most  highly  cited  years,  journals,  and 
countries  are  also  shown.  The  pervasive  themes  of  JACS  are 
identified  through  multi-word  phrase  analyses  of  the  database.  A 
phrase  proximity  analysis  of  the  database  shows  the  relationships 
among  the  pervasive  themes,  and  the  relationships  between  the 
pervasive  themes  and  subthemes. 
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In  the  Appendices  to  this  section,  selected  results  from  other 
Database  Tomography  studies  are  shown  to  display  further  capabilities 
of  this  system.  One  form  of  taxonomy  from  a  Near-Earth  Space  study 
is  shown;  another  type  of  taxonomy  from  a  Former  Soviet  Union  applied 
research  study  is  presented;  and  a  method  to  help  identify  promising 
research  directions  from  computerized  analysis  of  the  published 
literature  is  discussed. 

What  is  the  importance  of  applying  Database  Tomography  to  a  non¬ 
physical  science  field  such  as  RIA,  or  a  physical  science  field  such 
as  Chemistry?  Database  Tomography  provides  a  map  of  the  field  of 
interest  and,  analogous  to  ordinary  roadmaps,  serves  as  a  structured 
guide  to  reach  a  specific  destination  efficiently.  Suppose  one  wants 
to  understand  the  limitations  of  the  major  RIA  techniques,  and 
perhaps  identify  promising  avenues  for  improving  these  techniques. 
One  could  start  with  hit-or-miss  literature  searches  or  randomized 
personal  contacts,  or  one  could  start  with  Database  Tomography. 

Database  Tomography  would  identify  the  main  intellectual  thrust 
areas  in  RIA  or  Chemistry,  and  the  relationships  among  those  thrust 
areas.  As  part  of  the  analysis  output,  the  main  RIA  or  Chemistry 
techniques  conceptualized  and  employed  would  be  identified.  The 
major  journals  associated  with  each  thrust  area  and  technique  would 
be  identified,  the  major  authors  for  each  technique  and  thrust  area 
would  be  identified,  and  the  major  institutions  and  countries 
associated  with  each  technique  and  thrust  area  would  be  identified. 
The  ancillary  techniques  and  the  science  and  technology  areas  which 
could  support  and  improve  a  technique  or  thrust  area  would  be 
identified,  and  conversely  techniques  or  thrust  areas  which  could  be 
impacted  by  a  given  technique  would  be  identified. 

The  map,  then,  provides  a  comprehensive  overview  of  the  full 
picture,  and  allows  specific  starting  points  to  be  chosen  rationally 
for  more  detailed  investigations  into  a  topic  of  interest.  It  does 
not  obviate  the  need  for  detailed  investigation  of  the  literature  or 
interactions  with  the  main  performers  of  a  given  topical  area  in 
order  to  make  a  substantial  contribution  to  the  understanding  or  the 
advancement  of  this  topical  area,  but  allows  these  detailed  efforts 
to  be  executed  more  efficiently. 

DATABASE  GENERATION 

The  key  step  in  the  RIA  literature  analysis  is  the  generation  of 
the  database.  For  the  present  study,  the  database  consists  of 
selected  journal  abstracts  (including  authors,  titles,'  journals, 
author  addresses,  author  keywords,  abstract  narratives,  and 
references  cited  for  each  paper)  obtained  by  searching  the  Science 
Citation  Index  (SCI)  and  the  Social  Sciences  Citation  Index  (SSCI) . 
The  SCI  accesses  about  3000  journals  (mainly  in  the  physical 
sciences)  and  the  SSCI  accesses  about  half  that  amount  (mainly  in  the 
social  sciences) .  In  the  SCI  and  SSCI,  the  title,  keyword,  and 
abstract  fields  were  searched  using  keywords  relevant  to  RIA.  The 
resultant  abstracts  were  culled  to  those  relevant  to  RIA. 

The  search  was  performed  with  the  recently  developed  technique 
of  Simulated  Nucleation  (See  Section  IX-A;  also  Kostoff ,  1997f) , 
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which  includes  two  powerful  Database  Tomography  tools:  multi-word 
phrase  frequency  analysis  and  phrase  proximity  analysis.  An  initial 
database  of  titles,  keywords,  and  abstracts  was  created  from  a  core 
of  papers  known  to  be  highly  relevant  to  RIA.  A  phrase  frequency 
analysis  was  performed  on  this  textual  database.  The  high  frequency 
single,  double,  and  triple  word  phrases  obviously  relevant  to  RIA 
were  then  used  as  search  terms  in  the  SCI  and  SSCI  databases.  The 
process  was  repeated  on  the  new  database  of  titles,  keywords,  and 
abstracts  which  was  found.  A  few  more  iterations  were  performed 
until  convergence  was  obtained.  Before  the  final  iteration,  a  phrase 
proximity  analysis  was  performed  on  the  database  in  addition  to  the 
phrase  frequency  analysis.  This  additional  analysis  provided 
relevant  phrases  closely  related  to  the  main  themes  which  may  not 
have  had  high  frequency  occurrence.  The  value  of  this  search 
approach  is  that  the  search  terms  are  obtained  from  the  authors  in 
the  SCI  and  SSCI  databases,  not  by  guessing  on  the  part  of  the 
searcher.  The  resulting  final  database  may  be  the  most  complete  RIA 
journal  database  in  existence.  The  titles  of  the  papers  in  the  final 
RIA  database  are  listed  at  the  end  of  this  Handbook. 

As  stated  in  the  background  section,  the  JACS  database  consisted 
of  SCI  abstractions  of  all  the  papers  contained  in  the  1994  issues  of 
JACS. 


PROLIFIC  AUTHORS 

In  both  RIA  and  JACS,  the  author  field  was  separated  from  the 
database,  and  a  frequency  count  of  author  appearances  was  made.  The 
most  prolific  authors  follow,  in  order  of  decreasing  publications. 
Two  caveats  are  in  order  here. 

For  RIA,  the  journals  searched  were  limited  to  those  in  the  SCI 
and  SSCI.  Relevant  articles  in  other  journals  were  not  included. 
Books  or  major  reports  were  not  included.  The  keywords  used  were  a 
finite  set  of  the  author's  discretion,  and  undoubtedly  overlooked 
some  relevant  articles  in  RIA.  The  time  frame  of  the  articles 
included  in  the  present  analysis  was  1991-early  1995.  Thus,  there 
may  be  excellent  researchers  writing  in  the  field  of  RIA  who  were 
omitted  from  the  following  list  due  to  the  finite  selection  process, 
and  the  author's  apologies  are  extended  to  anyone  who  falls  into  this 
category.  In  particular,  those  authors  whose  work  has  been 
referenced  in  the  main  body  of  this  Handbook,  and  who  do  not  appear 
on  the  following  list,  should  be  considered  as  an  ex  officio  part  of 
the  list. 

For  the  Chemistry  component  of  the  study,  only  JACS  was  used. 
The  time  frame  of  the  study  is  1994.  Relevant  Chemistry  articles  in 
other  journals  were  not  included.  Books  or  major  reports  were  not 
included.  Thus,  there  are  undoubtedly  excellent  researchers  writing 
in  the  field  of  Chemistry  who  were  omitted  from  the  following  list 
due  to  the  finite  selection  process,  and  the  authors'  apologies  are 
extended  to  anyone  who  falls  into  this  category. 

There  were  approximately  2300  RIA  papers  retrieved  and 
approximately  2150  JACS  papers.  There  were  approximately  2975  RIA 
authors,  and  approximately  6535  JACS  authors,  which  average  to  1.3 
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authors  per  RIA  paper,  and  3  authors  per  JACS  paper.  The  ratio  of 
JACS  authors  per  paper  does  not  differ  appreciably  from  the  3.37 
authors  per  paper  obtained  in  a  recent  study  of  the  near-earth  space 
literature.  It  appears  that  the  RIA  papers  tend  to  be  individual 
efforts,  while  the  JACS  (and  space)  papers  tend  to  be  team  efforts. 
The  JACS  (and  space)  studies  could  involve  multiple  disciplines  and 
potentially  large  experiments  (certainly  true  for  the  space  studies) , 
which  would  account  for  the  difference  in  authors  per  paper. 

87.3%  of  the  RIA  authors  produced  one  paper  and  7.3%  produced 
two  papers,  while  84.3%  of  the  JACS  authors  produced  one  paper  and 
10.7%  produced  two  papers.  Thus,  in  both  cases,  about  5%  of  the 
authors  produced  three  or  more  papers,  although  in  each  case  the  mode 
author  produced  one  paper.  However,  as  Table  1  shows,  a  few  authors 
in  each  field  produced  an  order  of  magnitude  more  papers  than  the 
average  or  mode  author.  While  the  RIA  numbers  are  spread  over  four 
years,  the  JACS  numbers  are  for  a  single  year,  and  the  top  JACS 
numbers  are  quite  impressive. 

TABLE  1  MOST  PROLIFIC  AUTHORS  -  RIA 

GARFIELD-E  91;  SCHUBERT-A  18;  VANRAAN-AFJ  17;  GLANZEL-W  14; 
BRAUN-T  13;  GRILICHES-Z  11;  MCCAIN- KW  10;  LEYDESDORFF-L  10; 

NARIN-F  9;  KOSTOFF-RN  9;  COURTIAL-JP  9;  BONITZ-M  9;  VINKLER-P 
8;  NEDERHOF-AJ  8;  MOED-HF  8;  EGGHE-L  8;  ROUSSEAU-R  7; 
WELLJAMSDOROF-A  6;  TIJSSEN-RJW  6;  TERRADA-ML  6;  PINERO-JML  6; 

PETERS-HPF  6;  PERITZ-BC  6;  PAO-ML  6;  MENDEZ-A  6;  MACZELKA-H 
6 ;  LANCASTER- FW  6 ; 


TABLE  lA  MOST  PROLIFIC  AUTHORS  -  JACS 


SCHLEYER-PV  13,  RHEINGOLD-AL  13,  BOGER-DL  13,  TROST-BM  10, 
PAQUETTE-LA  10,  WHITESIDES-GM  9,  SPIRO-TG  9,  REBEK-J  9, 
MOROKUMA-K  8,  LIPPARD-SJ  8,  HROVAT-DA  8,  HAW-JF  8,  DIXON-DA 
8,  BUCHWALD-SL  8,  BORDEN-WT  8,  ADAM-W  8,  KITAGAWA-T  7, 
HOUK-KN  7,  GELLMAN-SH  7,  BRAUMAN-JI  7,  WILLNER-I  6, 

SQUIRES-RR  6,  SCHREIBER-SL  6,  ROBB-MA  6,  OLIVUCCI-M  6, 

NICOLAOU-KC  6,  INGOLD-KU  6,  ECHEGOYEN-L  6,  CLARDY-J  6, 

BORDWELL-FG  6,  BERNARDI-F  6,  BERGMAN-RG  6,  ARDUENGO-AJ  6, 


CODE:  THE  NUMBER  FOLLOWING  EACH  AUTHOR'S  NAME  REPRESENTS  THE  NUMBER 
OF  PAPERS  AUTHORED  OR  CO-AUTHORED  IN  THE  LITERATURE  DATABASE. 

PROLIFIC  JOURNALS 


A  similar  process  was  used  to  develop  a  frequency  count  of  journal 
appearances  for  RIA.  Similar  limitations  to  those  mentioned  above 
apply  to  the  journals,  and  similar  apologies  are  extended  to  journals 
not  listed.  The  most  prolific  journals  follow  in  order  of  decreasing 
frequency.  While  many  disciplines  are  represented  in  the  RIA  table, 
there  seems  to  be  large  representation  from  the  Medical/ 
Psychological  Sciences  field  and  the  Information/  Library  Sciences 
field.  There  are  645  separate  journals  listed  for  RIA.  While  the 
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average  number  of  papers  per  journal  is  3.57,  the  most  prolific 
journals  contain  one  to  two  orders  of  magnitude  more  RIA  papers. 

TABLE  2  MOST  PROLIFIC  JOURNALS  -  RIA 

SCIENTOMETRICS  336;  CURRENT  CONTENTS/LIFE  SCIENCES  139;  CURRENT 
CONTENTS  86;  CURRENT  CONTENTS/SOCIAL  &  BEHAVIORAL  SCIENCES  68; 
CURRENT  CONTENTS/CLINICAL  MEDICINE  68;  CURRENT  CONTENTS/PHYSICAL 
CHEMICAL  &  EARTH  SCIENCES  44;  CURRENT  CONTENTS/ENGINEERING 

TECHNOLOGY  &  APPLIED  SCIENCES  41;  SCIENCE  40;  NATURE  34; 
JOURNAL  OF  THE  AMERICAN  SOCIETY  FOR  INFORMATION  SCIENCE  33;  BRITISH 
MEDICAL  JOURNAL  31;  JAMA- JOURNAL  OF  THE  AMERICAN  MEDICAL 

ASSOCIATION  26;  BEHAVIORAL  AND  BRAIN  SCIENCES  25;  SCIENTIST  20; 

SCIENCES  20;  CURRENT  CONTENTS/AGRICULTURE  BIOLOGY  &  ENVIRONMENTAL 
20.;  INFORMATION  PROCESSING  &  MANAGEMENT  19;  BULLETIN  OF  THE 
MEDICAL  LIBRARY  ASSOCIATION  17;  JOURNAL  OF  INFORMATION  SCIENCE  16; 

AMERICAN  PSYCHOLOGIST  16;  LIBRARY  &  INFORMATION  SCIENCE  RESEARCH 
15;  HIGHER  EDUCATION  15; 

CODE:  THE  NUMBER  FOLLOWING  EACH  JOURNAL  REPRESENTS  THE  NUMBER  OF 
PAPERS  IN  THE  LITERATURE  DATABASE  PUBLISHED  IN  THE  JOURNAL 

PROLIFIC  INSTITUTIONS 

A  similar  process  was  used  to  develop  a  frequency  count  of 
institutional  address  appearances,  and  similar  apologies  are  extended 
to  institutions  not  listed.  The  most  prolific  institutions  follow  in 
order  of  decreasing  frequency.  It  should  be  noted,  especially  with 
regard  to  the  universities,  that  many  different  organizational 
components  may  be  included  under  the  single  organizational  heading. 
Lack  of  space  precluded  printing  out  the  components  under  the 
organizational  heading. 

For  RIA,  1125  institutions  are  represented  (average  2  papers  per 
institution,  and  2.64  authors  per  institution),  and  for  JACS,  750 
institutions  are  represented  (average  2.9  papers  per  institution,  and 
8.7  authors  per  institution) .  The  most  prolific  RIA  institutions  are 
almost  two  orders  of  magnitude  above  the  average  in  papers  generated, 
while  the  most  prolific  JACS  institutions  are  an  order  of  magnitude 
above  the  average.  These  differences  reflect  the  more  concentrated 
nature  of  JACS  papers  in  teams  and  institutions  relative  to  those  of 
RIA  papers.  Interestingly,  even  though  the  RIA  and  JACS  subject 
matter  are' very  different,  a  number  of  institutions  rank  as  the  most 
prolific  in  both  fields  (HARVARD  UNIV,  UNIV  OF  ILLINOIS,  YALE  UNIV, 
UNIV  OF  PENN,  UNIV  OF  MINNESOTA,  UNIV  OF  TEXAS,  UNIV  OF  WISCONSIN). 

TABLE  3  MOST  PROLIFIC  INSTITUTIONS  -  RIA 

INST  SCI  INFORMAT  109;  HARVARD  UNIV  61;  UNIV  OF  ILLINOIS  39; 
HUNGARIAN  ACAD  SCI  35;  LEIDEN  UNIV  32;  INDIJJIA  UNIV  32;  UNIV  OF 
MICHIGT^  31;  YALE  UNIV  25;  UNIV  OF  PENN  23;  UNIV  OF  N  CAROLINA 

22;  UNIV  OF  MINNESOTA  21;  UNIV  OF  TEXAS  21;  UNIV  OF  LONDON  20; 

JOHNS  HOPKINS  UNIV  20;  UNIV  OF  WISCONSIN  19;  PENN  STATE  UNIV  19; 


270 


CSIC  19;  UNIV  OF  SUSSEX  18;  OHIO  STATE  UNIV  17;  CORNELL  UNIV 
17;  UNIV  OF  PITTSBURGH  16;  UNIV  OF  CAMBRIDGE  16;  STANFORD  UNIV 
16;  UNIV  OF  MARYLAND  15;  UNIV  OF  CALIF  SAN  FRANCISCO  15;  UNIV 
OF  CALIF  DAVIS  14;  DREXEL  UNIV  14;  UNIV  OF  IOWA  13;  UNIV  OF  SO 
CALIF  13;  UNIV  OF  INSTELLING  ANTWERP  13;  UNIV  OF  CALIF  BERKELEY 
12;  UNIV  OF  CALIF  LOS  ANGELES  12; 

TABLE  3A  MOST  PROLIFIC  INSTITUTIONS  -  JACS 

MIT  67;  UNIV-ILLINOIS  56;  UNIV-TEXAS  51;  UNIV-CALIF-BERKELEY 
51;  SCRIPPS-CLIN-&-RES--INST  49;  STANFORD-UNIV  47;  CALTECH 
46;  HARVARD-UNIV  43;  NORTHWESTERN- UNIV  39;  UNIV-WISCONSIN 

38;  DUPONT-CO-INC  37;  UNIV-MINNESOTA  35;  EMORY-UNIV  35; 

UNIV-TORONTO  32;  UNIV-PENN  32;  PURDUE-UNIV  31;  CORNELL-UNIV 
3,0;  YALE-UNIV  30;  PRINCETON-UNIV  29;  TEXAS-A&M-UNIV  29; 
COLUMBIA-UNIV  27;  OHIO-STATE-UNIV  27;  MICHIGAN-STATE-UNIV  27; 
UNIV-GEORGIA  25;  INDIANA-UNIV  24;  UNIV-PITTSBURGH  23; 
HEBREW-UNIV-JERUSALEM  23;  UNIV-CALIF-SAN-DIEGO  22;  UNIV-TOKYO 
22;  UNIV-WASHINGTON  22;  UNIV-ROCHESTER  22;  UNIV-DELAWARE 

21;  TOKYO-INST-TECHNOL  21;  PENN- STATE -UNIV  20;  UNIV-N-CAROLINA 
20;  OSAKA- UNIV  19;  KYOTO-UNIV  19;  CNRS  18; 

RUTGERS-STATE-UNIV  18;  IOWA-STATE-UNIV-SCI-&-TECHNOL  17; 

UNIV-MICHIGAN  17;  UNIV-CALIF-IRVINE  17;  UNIV-VIRGINIA  17; 

UNIV-CALIF-SANTA- BARBARA  16;  UNIV-ERLANGEN-NURNBERG  16; 

NAGOYA-UNIV  16;  UNIV- CALI F-DAVIS  16;  UNIV-CALIF-LOS-ANGELES 

16;  UNIV-FLORIDA  15;  UNIV- ALBERTA  15;  UNIV-BRITISH-COLUMBIA 

15;  NATL-RES-COUNCIL-CANADA  15; 

CODE:  THE  NUMBER  FOLLOWING  EACH  INSTITUTION  REPRESENTS  THE  NUMBER  OF 
TIMES  A  NAME  OF  A  REPRESENTATIVE  FROM  THAT  INSTITUTION  APPEARS  AS  AN 
AUTHOR  OR  CO-AUTHOR  IN  THE  LITERATURE  DATABASE 

PROLIFIC  COUNTRIES 

A  similar  process  was  used  to  develop  a  frequency  count  of 
institutional  address  appearances,  and  similar  apologies  are  extended 
to  institutions  not  listed.  The  most  prolific  countries  follow  in 
order  of  decreasing  frequency. 

For  RIA,  56  countries  are  represented,  and  for  JACS,  44 
countries  are  represented.  The  United  States  is  about  an  order  of 
magnitude  more  prolific  than  its  nearest  competitor,  and  is  as 
prolific  as  its  major  competitors  combined.  In  the  four  studies 
performed  so  far  using  the  present  approach  (RIA,  Chemistry  [JACS], 
Near-Earth  Space,  Hypersonic-Supersonic  Flow) ,  this  dominant 
relationship  between  the  United  States  and  its  nearest  competitors  is 
observed,  Generically,  the  western  democracies  tend  to  be  the  most 
prolific.  In  addition,  Japan  is  in  the  first  JACS  tier  and  second 
RIA  tier;  Hungary  is  high  in  RIA;  and  India  and  Russia  are  both  well 
into  the  second  RIA  and  JACS  tiers. 

TABLE  4  MOST  PROLIFIC  COUNTRIES  -  RIA 
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USA, 1595; 
FRANCE ,71; 
INDIA, 32 ; 
ITALY, 22; 


UK, 279;  CANADA, 138;  NETHERLANDS , 80 ;  GERMANY, 79; 
AUSTRALIA, 69;  SPAIN, 58;  HUNGARY, 46;  BELGIUM, 45; 

ISRAEL, 30;  RUSSIA, 29;  NORWAY, 25;  .  JAPAN, 23; 

SWEDEN, 21;  DENMARK, 16;  SOUTH-AFRICA, 16 ;  MEXICO, 15; 


TABLE  4A  MOST  PROLIFIC  COUNTRIES  -  JACS 


USA  2040;  JAPT^  276;  CANADA  168;  GERMANY  148;  FRANCE 
116;  UK  109;  ITALY  97;  SPAIN  58;  SWITZERLAND  53;  ISRAEL 

48;  NETHERLANDS  43;  SWEDEN  40;  AUSTRALIA  35;  BELGIUM  18; 
DENMARK  18;  SOUTH- KOREA  18;  INDIA  12;  RUSSIA  12;  TAIWAN 

8; 

CODE:  THE  NUMBER  FOLLOWING  EACH  COUNTRY  REPRESENTS  THE  NUMBER  OF 
TIMES  A  NAME  OF  A  REPRESENTATIVE  FROM  THAT  COUNTRY  APPEARS  AS  AN 
AUTHOR  OR  CO-AUTHOR  IN  THE  LITERATURE  DATABASE 


PROLIFIC  CITATIONS 

The  citations  in  all  2  3  00  RIA  papers  were  aggregated  into  a  file 
of  over  37000  entries,  and  the  citations  in  all  2154  JACS  papers  were 
aggregated  into  a  file  of  over  85000  entries.  The  authors  most 
frequently  cited,  the  specific  papers  most  frequently  cited,  the 
journals  most  frequently  cited,  and  the  years  most  frequently  cited 
were  identified.  The  highly  cited  authors,  papers,  journals,  and 
years  are  presented  in  order  of  decreasing  frequency. 

While  the  numbers  of  RIA  and  JACS  papers  are  about  the  same, 
there  are  more  than  twice  as  many  citations  per  paper  on  average  in 
JACS  relative  to  RIA.  However,  many  of  the  RIA  articles  were 
editorials  or  editorial-like,  and  did  not  contain  references,  and 
therefore  no  conclusions  should  be  drawn  about  differences  in  numbers 
of  citations  per  journal  research  article  based  on  these  data. 

For  RIA,  there  are  30400  papers  and  18140  authors  cited  (average 
of  1.68  papers  per  author),  and  for  JACS,  there  are  64800  papers  and 
32450  authors  cited  (average  of  2  papers  per  author) .  Therefore, 
those  RIA  authors  that  do  cite  draw  from  a  modestly  wider  group  of 
authors  than  the  JACS  authors  that  cite.  For  RIA,  72%  of  authors 
cited  are  cited  once  and  14.5%  are  cited  twice,  while  for  JACS  60%  of 
authors  cited  are  cited  once  and  16.7%  are  cited  twice.  For  RIA, 
89.7%  of  the  papers  that  are  cited  are  cited  once  and  6.5%  are  cited 
twice,  while  for  JACS,  83%  of  the  papers  that  are  cited  are  cited 
once  and  11%  are  cited  twice.  Thus,  the  authors  cited  distribution 
seems  to  follow  the  more  classic  inverse  hyperbolic  Lotka ' s  Law  at 
low  citations,  while  the  paper  cited  distribution  follows  a  somewhat 
sharper  trajectory  closer  to  a  cubed  law. 

For  RIA,  a  number  of  the  most  highly  cited  authors  are  also  the 
most  prolific  (Garfield,  Narin,  Braun,  Schubert) .  These  particular 
authors  are  recognized  leaders  in  the  RIA  field,  and  their  work  also 
focuses  on  the  quantitative  aspects  of  RIA.  Because  of  the  time  lag 
between  papers  and  citations,  differences  should  be  expected  between 
the  most  prolific  authors  and  the  most  cited  authors.  Authors  who 
are  new  to  the  field  and  are  prolific  may  have  relatively  few 
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citations.  Also,  some  established  authors  who  are  highly  cited  may 
require  substantial  time  to  produce  seminal  papers. 

For  JACS,  some  of  the  most  highly  cited  authors,  are  also  the 
most  prolific  (Boger,  Trost) .  However,  some  of  the  prolific  authors 
could  have  been  highly  cited  in  other  journals,  which  would  not  have 
been  reflected  in  this  single  journal  study.  Also,  some  of  the 
highly  cited  authors  could  have  been  prolific  in  other  journals. 

For  RIA,  the  first  tier  of  highly  cited  papers  represents  many 
of  the  seminal  quantitative  approaches  (Garfield,  Schubert,  Small, 
Lotka) ,  while  the  second  tier  reflects  the  more  qualitative 
approaches  (Kuhn,  Price,  Cole) .  This  should  not  be  surprising,  since 
with  the  advent  of  fast  high-storage  computers  and  massive  databases, 
technology  enables  the  shifting  of  focus  to  more  quantitative  data- 
intensive  studies. 

For  the  JACS  database,  the  most  highly  cited  papers  reflect  the 
evolution  of  metal-complex  chemistry,  with  a  continuing  focus  on 
transition  metals  (d-shell  especially)  reactions.  There  is  a  clear, 
continued  emphasis  on  the  synthesis  (i.e.  first  reported  formation) 
of  a  great  variety  of  such  complexes.  Also  reported  are  new  and 
novel  applications  of  instrumental  techniques  to  characterize  the  new 
complexes,  especially  those  involving  organic  moieties  as  ligands, 
especially  application  of  such  techniques  as  nuclear  magnetic 
resources  (NMR) ,  X-ray  diffraction,  and  mass  spectrometry  to 
determine  the  structure  of  new  transition  metal  complexes.  The  body 
of  literature  analyzed  (1994  JACS)  clearly  shows  an  increasing 
utilization  of  computer-based  techniques  as  ab  initio  molecular 
orbital  calculations,  and  molecular  orbital  calculations,  and 
molecular  mechanistic  approaches  to  elucidate  structure,  and  provide 
guidance  in  understanding  mechanism  of  formation  and  catalytic 
pathways  mediated  by  an  increasing  body  of  complexes. 

For  RIA,  the  most  highly  cited  journals  are  congruent  with  the 
most  prolific  journals.  The  top  five  cited  journals  (Scientometrics, 
JASIS,  Science,  Nature,  JAMA)  are  within  the  top  seven  prolific 
journals  (if  Current  Contents  is  treated  as  a  single  journal) .  One 
would  expect  more  congruence  between  the  highly  cited  and  highly 
prolific  journals  (and  most  highly  cited  and  prolific  institutions, 
if  the  data  were  available)  than  between  the  highly  cited  and 
prolific  authors.  The  time  lags  between  publication  and  citation  are 
not  insignificant  relative  to  the  span  of  an  author's  productive 
career,  whereas  the  time  lags  for  journals  (and  institutions)  are 
relatively  smaller  compared  to  the  period  over  which  a  journal  (or 
institution)  has  established  a  reputation  for  publishing ' quality  in 
given  fields. 

The  JACS  authors  cited  6725  different  journals  and  other 
sources,  with  an  average  of  over  12.6  citations  per  journal. 
However,  the  most  highly  cited  journal  by  far  is  JACS,  receiving  25% 
of  total  citations,  or  three  orders  of  magnitude  higher  citations 
than  average.  Its  citations  equal  those  of  the  next  seven  most  cited 
journals  combined. 


TABLE  5  MOST  CITED  AUTHORS  -  RIA 
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GARFIELD-E  870;  NARIN-F  181;  PRICE-DD  159;  BRAUN-T  142; 

SMALL-H  141;  SCHUBERT-A  139;  MORAVCSIK-MJ  105;  EGGHE-L  90; 
MERTON-RK  90;  MOED-HF  82;  MCCAIN-KW  78;  ,  COLE-S  77; 

LEYDESDORFF-L  77;  ZUCKERMAN-H  77;  BROOKES-BC  72;  ’  CALLON-M  71; 
GRILICHES-Z  70;  ARUNACHALAM-S  69;  COLE-JR  66;  NEDERHOF-AJ  65; 
SMALL-HG  65;  MARTIN-BR  64;  LINDSEY-D  61;  KOSTOFF-RN  60; 

CRANE-D  58;  CRONIN-B  57;  ALLISON-PD  56;  FRAME-JD  54; 

CHUBIN-DE  53;  MACROBERTS-MH  53;  LINE-MB  52;  PAO-ML  52; 

CICCHETTI-DV  51;  IRVINE-J  51;  VINKLER-P  51;  KUHN-TS  50; 
VANRAAN-AFJ  50;  LONG-JS  49;  CARPENTER-MP  48;  ABT-HA  47; 

PERITZ-BC  46;  PRICE-DJD  46;  VLACHY-J  46;  HARGENS-LL  45; 

HAMILTON-DP  44;  NALIMOV-W  43;  WHITE-HD  43;  COURTIAL-JP  42; 

LOTKA-AJ  40; 

TABLE  5A  MOST  CITED  AUTHORS  -  JACS 

BOGER-DL  307;  FRISCH-MJ  225;  TROST-BM  175;  DEWAR-MJS  171; 
COREY-EJ  154;  COLLMAN-JP  127;  EVANS-DA  120;  HEHRE-WJ  119; 

BORDWELL-FG  116;  WIBERG-KB  116;  OLAH-GA  114;  JORGENSEN-WL  108; 

COTTON-FA  106;  POPLE-JA  102;  NICOLAOU-KC  99;  AD2^-W  95; 
LIAS-SG  87;  LEHN-JM  86;  MOSS-RA  86;  BAX-A  82;  PAQUETTE-LA  82; 

MARCUS-RA  73;  EVANS-WJ  71;  HOFFMANN-R  71;  ALLINGER-NL  64; 

CURRAN-DP  64;  BROWN-HC  63;  DUNNING-TH  62;  BECKWITH-ALJ  60; 
CRABTREE-RH  60;  SHELDRICK-GM  60;  BROOKHART-M  59;  TURRO-NJ  59; 

DENMARK-SE  58;  GOULD-IR  58;  REED-AE  58;  STILL-WC  58; 

BERNARDI-F  56;  CRAM-DJ  56;  NEGISHI-E  56;  NEWCOMB-M  56; 

PAULING-L  56;  BALDWIN-JE  55;  KUBAS-GJ  55;  HOUK-KN  54; 

YAMAMOTO-Y  54;  BARTON-DHR  53;  JENCKS-WP  53;  BECKE-AD  52; 

DOYLE-MP  52;  GROVES- JT  52;  ARDUENGO-AJ  51; 

CODE:  THE  NUMBER  FOLLOWING  EACH  AUTHOR'S  NAME  REPRESENTS  THE  NUMBER 
OF  TIMES  THIS  PERSON  WAS  FIRST  AUTHOR  OF  A  REFERENCE  CITED  IN  THE 
LITERATURE  DATABASE 

TABLE  6  MOST  CITED  PAPERS  -  RIA 

GARFIELD-E-197 9-CITATION-INDEXING  55 
SCHUBERT-A-1989-SCIENTOMETRICS-V16-P3  40 
GARFIELD-E-1972-SCIENCE-V178-P471  40 
SMALL-H-1973-J-AM-SOC-INFORM-SCI-V24-P265  35 
LOTKA-AJ-192  6- J-WASHINGTON-ACADEMY-V16-P3 17  3  5 

KUHN-TS-1970-STRUCTURE-SCI-REVOLU  33 
PRICE-DD-1963-LITTLE-SCI-BIG-SCI  32 
COLE-JR-1973-SOCIAL-STRATIFICATIO  29 
NARIN-F- 19 7 6-EVALUATIVE-BIBLIOMET  27 
SMITH-LC-1981-LIBR-TRENDS-V30-P83  25 
CRANE-D-1972-INVISIBLE-COLLEGES  24 
PETERS-DP-1982-BEHAVIORAL-BRAIN-SCI-V5-P187  22 
MERTON-RK-1973-SOCIOLOGY-SCI  22 
MARTIN-BR-1983-RES-POLICY-V12-P61  22 
SMALL-HG-1974-SCI-STUD-V4-P17  21 
HAMILTON-DP-1990-SCIENCE-V250-P1331  20 


274 


MORAVCSIK-MJ-1975-SOC-STUD-SCI-V5-P86  19 
KING-J-1987-J-INFORM-SCI-V13-P261  19 
HOWARD-GS-1987-AM-PSYCHOL-V42-P975  19 

TABLE  6A  MOST  CITED  PAPERS  -  JACS 


PRISCH-MJ-1992-GAUSSIAN-92,90 

HEHRE-WJ-1986-AB-INITIO-MOL-ORBITA,65 

DEWAR-MJS-1985-J-AM-CHEM-SOC-V107-P3902,50 

FRISCH-MJ-1990-GAUSSIAN-90,39 

HARIHARAN-PC-1973-THEOR-CHIM-ACTA-V28-P213 ,39 

LIAS-SG-1988-J-PHYS-CHEM-REF-D-S1-V17 , 38 

MOLLER-C-1934-PHYS-REV-V46-P618,38 

STILL-WC-1978-J-ORG-CHEM-V43-P2923,28 

HEHRE-WJ-1972-J-CHEM-PHYS-V56-P2257,24 

LEHN- JM- 19  8  8 -ANGEW-CHEM- INT-EDIT-V2  7 -P8  9 , 2  4 

MCMILLEN-DF- 19  8  2 - ANNU-REV-PHYS-CHEM-V3  3-P493,23 

REED-AE-1988-CHEM-REV-V88-P899,23 

BECKE-AD-1988-PHYS-REV-A-V38-P3098,22 

WEINER-SJ-1984-J-AM-CHEM-SOC-V10  6-P7  65 , 2 1 

BONDI-A-1964-J-PHYS-CHEM-US-V68-P441,20 

MOHAMADI-F- 19  9  0 - J-COMPUT-CHEM-V11-P4  4  0,20 

VOSKO-SH-1980-CAN-J-PHYS-V58-P1200,20 

FRISCH-MJ-1992-GAUSSIAN-92-REVISION,19 

JORGENSEN- WL-1983-J- CHEM- PH YS - V7  9-P926,19 

POPLE-JA-1976-INT-J-QU2^TUM-CHEM-S-V10-P1,19 

WUTHRICH-K-1986-NMR-PROTEINS-NUCLEIC,19 

HAY-PJ-1985-J-CHEM-PHYS-V82-P299,18 

MARCUS-RA-1985-BIOCHIM-BIOPHYS-ACTA-V811-P265,18 

PARR- RG- 1 9  8  9 -DENS I T Y-FUNCTIONAL-T , 1 8 

TABLE  7  MOST  CITED  JOURNALS  -  RIA 


SCIENTOMETRICS ,1343; 


J-AM-SOC-INFORM-SCI,679; 


NATURE, 388; 

SOC-STUD-SCI,324; 

RES-POLICY,251; 

J-INFORM-SCI,183 ; 

AM-ECON-REV,123; 

BRIT-MED-J,113; 


SCIENCE, 646; 
JAMA- J-AM-MED-ASSOC, 387 ;  AM- PSYCHOL, 346 ; 

J-DOC,276;  NEW-ENGL-J-MED,268; 

CURR-CONTENTS,245;  AM-SOCIOL-REV, 222 ; 

COLL-RES-LIBR,141;  LANCET, 138; 

ANN-INTERN-MED, 115 ;  ESSAYS-INFORMATION-S , 114 ; 
J-PERS-SOC-PSYCHOL, 113 ;  J-APPL-PSYCHOL, 109 ; 


INFORM-PROCESS-MANAG,98;  PSYCHOL-BULL, 98; 


TABLE  7A  MOST  CITED  JOURNALS  -  JACS 


J-AM-CHEM-SOC  17883;  J-ORG-CHEM  3257  ;J-CHEM-PHYS  2916; 
TETRAHEDRON- LETT  2593 ; J-PHYS-CHEM-US  2496  ;INORG-CHEM  2204 
BIOCHEMISTRY-US  1799  ; ANGEW-CHEM-INT-EDIT  1795  ; 
J-CHEM-SOC-CHEM-COMM  1568  ;  ORGANOMETALLICS  1312  ;  SCIENCE 
1226  ;  CHEM- PHYS- LETT  1051  ;  CHEM- REV  1039  ;  TETRAHEDRON 

997  ;  ACCOUNTS-CHEM-RES  985  ;  P-NATL-ACAD-SCI-USA  858  ; 
J-BIOL-CHEM  813  ;  NATURE  800  ;  J-ORG2^0MET-CHEM  721  ;  UNPUB 
681  ;  J-CHEM-SOC  612  ;  J-MOL-BIOL  525;  CAN-J-CHEM  507;  CHEM-BER 
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472  ;  J-MAGN-RESON  470;  J-COMPUT-CHEM  418;  BIOCHIM-BIOPHYS-ACTA 
379;  ACTA-CRYSTALLOGR-B  361  ;  B-CHEM-SOC-JPN  359 ;HELV-CHIM- ACTA 
346  ;  PURE-APPL-CHEM  342;  CHEM-LETT  334;  SYNTHESIS-STUTTGART 
328  ;  CHEM-PHYS  283  ;  MACROMOLECULES  278  ;  J-ANTIBIOT  277  ; 
ANGEW-CHEM  255  ;  J-MED-CHEM  250  ;  BIOPOLYMERS  242  ;  LANGMUIR 
239  ;  MOL-PHYS  233  ;  PHYS-REV-B  232  ;  ANAL-CHEM  225  ; 

INT-J-MASS-SPECTROM  222  ;  NUCLEIC-ACIDS-RES  222  ; 

J-CHEM-SOC-DALTON  215  ;  J-CHEM-SOC-DA  209  ; 

BIOCHEM-BIOPH-RES-CO  204;  THEOR-CHIM-ACTA  202; 

TABLE  8  MOST  CITED  YEARS  -  RIA 

1990,3092;  1989,2826;  1991,2726;  1988,2580;  1987,2177; 

1992,2094;  1986,1942;  1985,1773;  1984,1436;  1983,1288; 

1982,1217;  1993,1122;  1981,1092;  1979,1023;  1980,981; 

TABLE  8A  MOST  CITED  YEARS  -  JACS 

1992  8297;  1993  7764;  1991  7470;  1990  6265;  1989  5282  ;  1988 

4742  ;  1987  4072  ;  1986  3499  ;  1985  3299  ;  1984  2757  ;  1983 

2445  ;  1982  2372  ;  1980  1991  ;  1981  1874  ,*  0  1711  ;  1994 

1669  ;  1978  1625  ;  1979  1537  ;  1977  1380  ;  1976  1343  ; 

CODE:  THE  NUMBER  FOLLOWING  EACH  PAPER  REPRESENTS  THE  NUMBER  OF  TIMES 
THE  PAPER  WAS  CITED  IN  THE  LITERATURE  DATABASE 

PROLIFIC  KEYWORDS 

A  similar  process  was  used  to  obtain  prolific  keyword 
appearances.  The  paucity  of  RIA  keywords  is  due  to  the  fact  that 
relatively  few  authors  submitted  keywords  to  the  database.  There  are 
approximately  an  order  of  magnitude  more  keywords  from  JACS . 

For  RIA,  the  keywords,  when  viewed  as  an  integral  whole, 
describe  the  following  RIA  scenario:  Use  of  Peer  Review  and 
quantitative  Performance  Indicators  such  as  Citation  Analysis  and 
Bibliometrics  for  the  purpose  of  Quality  Assurance  of  University 
Publications  from  Medical  and  Educational  Research. 

For  JACS,  the  keywords,  when  viewed  as  an  integrated  whole, 
describe  the  following  scenario  of  chemistry  as  reflected  in  JACS:  a 
continued  focus  on  the  synthesis  of  transition  and  heavy-metal 
complexes,  and  the  elucidation  of  formation  pathways  (mechanisms)  and 
structure  of  the  various  complexes.  There  is  a  continued  emphasis  on 
possible  catalytic  activity  (especially  redox  reactions)  associated 
with  the  complexes,  and  an  increasing  examination  of  the  biological 
aspects  at  transition  metal  complex  chemistry.  Indeed,  some  cited 
work  clearly  examines  the  interactions  of  such  bio-molecules  as 
proteins  and  metals,  both  as  metals  catalyzing  protein  formation 
and/or  controlling  protein  conformations.  Also,  the  cited  papers 
deal  at  length  with  instrumental  techniques  associated  with  metal- 
complex  structure  elucidation.  As  only  one  metal-complex  structure 
out  of  many  possible  may  prove  to  be  active,  structure  elnsidation  is 
clearly  of  interest  within  the  research  community. 
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TABLE  9  MOST  PROLIFIC  KEYWORDS  -  RIA 


PEER  REVIEW  19;  RESEARCH  13;  CITATION  7;  CITATION  ANALYSIS  7; 

CITATIONS  6;  PUBLICATION  4;  PERFORMANCE  INDICATORS  4; 

BIBLIOMETRICS  4;  UNIVERSITIES  3;  QUALITY  ASSURANCE  3; 

PUBLISHING  3;  PUBLICATIONS  3;  PREVENTION  3;  PERFORM2^CE  3; 
MEDICAL  RESEARCH  3;  ITALY  3;  EDUCATIONAL  RESEARCH  3;  EDUCATION 
3;  DECISION  SUPPORT  SYSTEMS  3; 

TABLE  9A  MOST  PROLIFIC  KEYWORDS  -  JACS 

COMPLEXES  220;  CHEMISTRY  146;  DERIVATIVES  120;  SPECTROSCOPY  110; 
MECHANISM  108;  MOLECULES  80;  CRYSTAL- STRUCTURE  77;  BINDING  68; 
ABINITIO  64;  REACTIVITY  63;  SPECTRA  61;  PROTEINS  59;  COMPLEX  56; 
LIGANDS  56;  GAS-PHASE  54;  ACID  53;  1  51;  ENERGIES  47;  WATER  46; 

MODEL  43;  ORGANIC-SYNTHESIS  42;  RESOLUTION  40;  SYSTEMS  40;  NMR 
40;  BOND  38;  STRUCTURE  37;  NUCLEAR-MAGNETIC-RESONANCE  37; 
RECOGNITION  37;  CLEAVAGE  37;  OXIDATION  37;  MOLECULAR-STRUCTURE  36; 
PROTEIN  35;  IONS  35;  ALCOHOLS  35;  GENERATION  35;  DESIGN  35; 
DYNAMICS  33;  CARBON  32;  KETONES  32;  DNA  31;  RESONANCE  31; 
KINETICS  31;  ESTERS  30;  ACTIVATION  30;  ELECTRON-TRANSFER  30; 
ELECTRONIC-STRUCTURE  30;  AQUEOUS-SOLUTION  30;  NUCLEAR 

MAGNETIC-RESONANCE  29;  STEREOCHEMISTRY  29;  REDUCTION  29;  STATE  28; 
EXCHl^GE  28;  ANALOGS  27;  CRYSTAL  27;  HYDROGEN  27;  PHOTOCHEMISTRY 
26;  LIGAND  26;  REACTIONS  26;  COORDINATION  25;  DEPENDENCE  25; 

CODE:  THE  NUMBER  AFTER  EACH  KEYWORD  REPRESENTS  THE  NUMBER  OF  TIMES 
THE  KEYWORD  APPEARED  IN  THE  PAPERS  OF  THE  LITERATURE  DATABASE 

PERVASIVE  THEMES 

To  obtain  pervasive  themes,  single,  double,  and  triple  word 
phrases  from  the  text  of  the  database  were  identified,  and  the  high 
frequency  high  technical  content  phrases  were  identified  as  the 
pervasive  themes.  In  this  particular  exercise,  the  databases  for  RIA 
and  JACS  were  each  split  into  two  parts  (titles  and  abstracts) ,  and 
the  analysis  was  done  on  each  part.  The  titles  of  the  papers  were 
put  into  a  separate  database,  and  the  multiword  frequency  analysis 
was  performed.  The  abstracts  of  the  papers  constituted  a  separate 
database  as  well. 

Following  are  the  raw  data  outputs  from  these  two  sub-databases 
for  both  RIA  and  JACS .  The  number  preceding  the  phrase  is  the 
frequency  of  appearance  of  the  phrase  in  the  database.  Those 

phrases  in  RIA  which  are  relatively  specific  are  underlined,  and  will 
be  used  for  future  literature  searches  as  keywords.  The  major  themes 
include  quantitative  RIA  approaches  such  as  BIBLIOMETRICS/ 
SCIENTOMETRICS/  CITATIONS,  qualitative  approaches  such  as  PEER 
REVIEW,  and  more  generic  terms  such  as  (RESEARCH  or  SCIENCE) 
PRODUCTIVITY/  OUTPUT/  PERFORMANCE/  BENEFIT/  IMPACT. 

The  major  Chemistry  themes  as  reflected  in  JACS  include  study  of 
Reactions  (RATE  CONSTANTS,  TRANSITION  STATE,  ELECTRON  TRANSFER, 
DIELS-ALDER)  and  Complexes  (SPACE  GROUP,  TRANSITION-METAL,  MOLECULAR- 
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HYDROGEN,  CRYSTAL  STRUCTURE)  using  both  experimental  approaches  (X- 
RAY  DIFFRACTION,  NMR  SPECTROSCOPY,  MASS  SPECTROMETRY)  and 
computational  approaches  (COMPUTATIONAL  QUANTUM  CHEMISTRY,  AB  INITIO 
MOLECULAR  ORBITAL  METHODS,  MOLECULAR  MECHANICS  CALCULATIONS). 

TABLE  10  TITLE  DOUBLE  WORD  FREQUENCIES  -  RIA 

315  CITATION-CLASSIC  COMMENTARY  116  CITATION-  CLASSIC  115  CLASSIC 
COMMENTARY  43  CITATION  ANALYSIS  35  PERFORMANCE  INDICATORS  24 
RESEARCH  PRODUCTIVITY  22  EVALUATION  RESEARCH  20  BIBLIOMETRIC 
ANALYSIS  16  RESEARCH  PERFORMANCE  14  SCIENTIFIC  PRODUCTIVITY  13 
PEER-REVIEW  PROCESS  13  SCIENTIFIC  LITERATURE  12  LITTLE 

SCIENTOMETRICS  11  BIBLIOMETRIC  STUDY  11  SCIENTIFIC  PRODUCTION 
10  CITATION  IMPACT  10  PUBLICATION  PRODUCTIVITY  10  RESEARCH  IMPACT 
9  BIBLIOMETRIC  INDICATORS  9  CHOLESTEROL  LOWERING  9  CITATION 
INDEX  9  CITATION  INDEXES  9  CITATION  PATTERNS  9  LOWERING 

TRIALS  9  PEER  REVIEWERS  9  REFORM  OPTIONS  9  RESEARCH 

ASSESSMENT  9  SCIENCE  CITATION  9  SCIENTIFIC  PERFORMANCE  8  BIG 
SCIENTOMETRICS  8  CITATION  RATES  8  INTERNATIONAL  SCIENTIFIC 
8  QUALITATIVE  EVALUATION  8  RESEARCH  METHODS  8  SCIENTOMETRICS 
BIG  7  ASSESSMENT  EXERCISE  7  CITATION  DATA  7  PEER-  REVIEW 
7  RESEARCH  BENEFITS  7  RESEARCH  EVALUATION  7  SCIENCE  POLICY 
7  SCIENTIFIC  COLLABORATION  7  UNITED-STATES  SCIENCE  6  CITATION 
COUNTS  6  CONSUMER  RESEARCH  6  EDITORIAL  PEER-REVIEW  6  IMPACT 
ASSESSMENT  6  JOURNAL  ARTICLES  6  LEDERBERG  JOSHUA  6  MEDICINE 
VOL  6  NOBEL  CLASS  6  PEER-REVIEWED  JOURNALS  6  PEERLESS 

SCIENCE  6  QUANTITATIVE  INDICATORS 

CODE:  THE  NUMBER  FOLLOWING  EACH  WORD  PAIR  REPRESENTS  THE  NUMBER  OF 
TIMES  THE  WORD  PAIR  APPEARED  IN  ALL  THE  TITLES  OF  THE  LITERATURE 
DATABASE 


TABLE  11  TITLE  TRIPLE  WORD  FREQUENCIES  -  RIA 

115  CITATION-  CLASSIC  COMMENTARY  17  RESEARCH  AND  EVALUATION  11 
EVALUATION  AND  RESEARCH  10  EVALUATION  OF  RESEARCH  9  CHOLESTEROL 
LOWERING  TRIALS  9  CITATION  AND  OUTCOME  9  FREQUENCY  OF  CITATION 
8  LIBRARY  AND  INFORMATION- SCIENCE  8  LITTLE  SCIENTOMETRICS  BIG 

8  OPTIONS  FOR  PEER-REVIEW  8  OUTCOME  OF  CHOLESTEROL  8  SCIENCE 
AND  TECHNOLOGY  8  SCIENTOMETRICS  BIG  SCIENTOMETRICS  7  INDICATORS 
IN  HIGHER-EDUCATION  7  RESEARCH  ASSESSMENT  EXERCISE  6  INTENT  OF 
PEER-REVIEWED  6  PEER-REVIEW  AND  UNITED-STATES  6  RELIABILITY  OF 
PEER-REVIEW  6  REPRINTED  FROM  SCIENCE  6  RESEARCH  IMPACT 

ASSESSMENT  6  SCIENTOMETRICS  AND  BEYOND  6  UNITED-STATES  SCIENCE 
POLICY  5  APPLICATIONS  FOR  RESEARCH  5  COMMENTARY  ON  STUDIES 
5  COMMUNICATION  AND  BIBLIOMETRICS  5  EVALUATION  AND  TEACHING  5 
INQUIRY  FOR  LIBRARY-SCIENCE  5  INTERNATIONAL  SCIENTIFIC 

COLLABORATION  5  METHODS  AND  APPLICATIONS  5  QUALITY  OF  CARE 
5  REPRINTED  FROM  THEORETICAL  5  THEORETICAL  MEDICINE  VOL 

TABLE  12  TITLE  SINGLE  WORD  FREQUENCIES  -  RIA 
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432  COMMENTARY  390  RESEARCH  317  CITATION-CLASSIC  273  PEER-REVIEW 
258  CITATION  158  SCIENCE  153  ANALYSIS  151  SCIENTIFIC  137 
EVALUATION  121  CLASSIC  117  CITATION-  105  PERFORMANCE  87 

INDICATORS  80  JOURNALS  72  BIBLIOMETRIC  72  CITATIONS  71 
PRODUCTIVITY  70  IMPACT  66  LITERATURE  66  STUDY  61  ASSESSMENT 
61  JOURNAL  54  DEVELOPMENT  54  PUBLICATION  53  REVIEW  52 
QUALITY  49  INTRODUCTION  45  STUDIES  44  SCIENTOMETRICS  41 
REPRINTED  41  VOL  39  INTERNATIONAL  38  METHOD  38  METHODS  37 
PG  35  DATA  35  INFORMATION  35  PAPERS  33  STRUCTURE  33  THEORY 
31  PATTERNS  31  POLICY  31  PROCESS  31  PUBLICATIONS  30 
PSYCHOLOGY  30  SYSTEM  29  CASE  29  PRODUCTION  28  COMMUNICATION 
28  EVALUATING  28  INFLUENCE  28  REPLY  28  SYSTEMS  28  TECHNOLOGY 
27  BEHAVIOR  27  BIBLIOMETRICS  27  COMPARISON  27  SCIENTOMETRIC 
26  CLINICAL  26  MEDICAL  25  ARTICLES  25  EFFECTS  25  HUMAN  25 


TABLE  13  ABSTRACT  DOUBLE  WORD  FREQUENCIES  -  RIA 

152  PEER  REVIEW  54  EVALUATION  RESEARCH  52  CITATION  INDEX  44 

HEALTH  CARE  44  PERFORMANCE  INDICATORS  38  CITATION  ANALYSIS  38 
SCIENCE  CITATION  34  UNITED  STATES  30  SOCIAL  SCIENCES  29  REVIEW 
PROCESS  26  ARTICLES  PUBLISHED  25  INFORMATION  SCIENCE  25 
RESEARCH  PRODUCTIVITY  23  IMPACT  FACTOR  21  PAPERS  PUBLISHED  21 
SCIENTIFIC  RESEARCH  20  RESEARCH  PERFORMANCE  19  JOURNAL  ARTICLES 
19  SOCIAL  SCIENCE  18  HIGHLY  CITED  18  RESEARCH  ASSESSMENT  18 

TOTAL  NUMBER  17  CITATION  RATES  17  PAPER  PRESENTS  17  RESEARCH 
OUTPUT  17  SCIENTIFIC  PRODUCTIVITY  16  CITATION  PATTERNS  16 
EVALUATIVE  RESEARCH  16  MENTAL  HEALTH  15  CHEMICAL  ENGINEERING  15 

HEALTH  PROMOTION  15  INFORMATION  RETRIEVAL  15  PAPER  DESCRIBES  15 

SCIENTIFIC  COMMUNITY  15  SCIENTIFIC  LITERATURE  14 

CODE!  THE  NUMBER  FOLLOWING  EACH  WORD  PAIR  REPRESENTS  THE  NUMBER  OP 
TIMES  THE  WORD  PAIR  APPEARED  IN  ALL  THE  ABSTRACTS  OF  THE  LITERATURE 
DATABASE 


TABLE  14  ABSTRACT  TRIPLE  WORD  FREQUENCIES  -  RIA 

36  SCIENCE  CITATION  INDEX  31  QUALITY  OF  CARE  24  NUMBER  OF 
CITATIONS  23  SCIENCE  AND  TECHNOLOGY  18  LIBRARY  AND  INFORMATION 
18  RESEARCH  AND  EVALUATION  16  PEER  REVIEW  PROCESS  13  NUMBER  OF 
PUBLICATIONS  12  EVALUATION  OF  RESEARCH  12  NUMBER  OF  PAPERS  11 
NUMBER  OF  AUTHORS  10  RESEARCH  AND  DEVELOPMENT  9  RESEARCH 

ASSESSMENT  EXERCISE  8  EVALUATION  AND  RESEARCH  8  JOURNAL 

CITATION  REPORTS  8  MAIN  OUTCOME  MEASURES  8  NUMBER  OF  ARTICLES 
8  QUALITY  OF  LIFE  8  SCIENCES  CITATION  INDEX  8  SOCIAL  SCIENCES 
CITATION  7  CITATION  INDEX  SCI  7  CITING  AND  CITED  7  PUBLISHED 
IN  JOURNALS  7  QUANTITATIVE  AND  QUALITATIVE  7  SCIENTIFIC  AND 

TECHNOLOGICAL  7  SCIENTISTS  AND  ENGINEERS  7  SOCIAL  WORK  JOURNALS 
6  CORONARY  HEART  DISEASE  6  INSTITUTES  OF  HEALTH  6  JOURNAL  OF 
CLINICAL  6  NATURE  OF  SCIENCE  6  PUBLICATION  AND  CITATION  6 
RESEARCH  AND  ASSESSMENT 
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TABLE  15  ABSTRACT  SINGLE  WORD  FREQUENCIES  -  RIA 


1189  RESEARCH  386  CITATION  368  JOURNALS  359  STUDY.  343  ANALYSIS 
338  DATA  319  SCIENTIFIC  313  SCIENCE  296  REVIEW  269  ARTICLES  268 
JOURNAL  262  INFORMATION  252  PAPER  246  CITATIONS  239  AUTHORS  238 
QUALITY  236  PAPERS  232  PERFORMANCE  228  NUMBER  226  EVALUATION  203 
PUBLISHED  200  ARTICLE  196  STUDIES  195  SOCIAL  193  PEER  190  TWO 
188  HEALTH  185  IMPACT  185  LITERATURE  182  BASED  165  PROCESS  161 
FIELD  160  CARE  154  INDICATORS  152  SYSTEM  151  DEVELOPMENT  151 
MODEL  148  PRODUCTIVITY  146  YEARS  145  CITED  143  PUBLICATIONS  135 
ASSESSMENT  133  METHODS  128  MEDICAL  124  PUBLICATION  120  POLICY 
119  COUNTRIES  115  FOUND  115  INDEX  114  AREAS  111  CLINICAL  110 
FINDINGS  109  GROUP  109  TECHNOLOGY  105  DIFFERENCES  104  FACULTY 
103  MEASURES  100  LEVEL  100 

TABLE  lOA  TITLE  DOUBLE  WORD  FREQUENCIES  -  JACS 

56  TOTAL  SYNTHESIS;  52  CHEM  ENGN;  MOLECULAR  RECOGNITION;  38 
PHYS  CHEM;  35  RATE  CONSTANTS;  34  MOLEC  BIOL;  31  NUCLEAR 
MAGNETIC-RESONANCE;  28  STRUCTURAL  CHARACTERIZATION;  28  THEORET 
CHEM;  26  EXPTL  STN;  26  TRANSITION-METAL  COMPLEXES;  24 
PHARMACEUT  SCI;  24  RAY  CRYSTAL-STRUCTURE;  24  USA 
TRANSITION-METAL;  23  DIELS-ALDER  REACTIONS;  22  RADICAL  CATIONS; 

22  X-RAY  STRUCTURE;  21  MOLECULAR-ORBITAL  METHODS;  21 
RESONANCE  RAMIU^;  20  ENANTIOSELECTIVE  SYNTHESIS;  20 
STEREOSELECTIVE  SYNTHESIS;  19  AB-INITIO  STUDY;  19  ANORCT^N  CHEM; 

19  BOND  ACTIVATION;  19  IRON  III;  19  MOLECULAR  MECHANICS; 

19  USA  NUCLEAR-MAGNETIC-RESONANC;  18  REDUCTIVE  ELIMINATION;  17 
OXIDATIVE  ADDITION;  16  ANTITUMOR  l^TIBIOTICS;  16  CARBENE 
COMPLEXES;  16  II  COMPLEXES;  16  MOLECULAR-  STRUCTURE;  16 
POTENTIAL-ENERGY  SURFACE;  15  DIELS-ALDER  REACTION;  15  ISOTOPE 
EFFECTS;  15  RUTHENIUM  II;  14  CRYSTAL-  STRUCTURE;  14 
ELECTRON-TRANSFER  REACTIONS;  14  III  COMPLEXES;  14  PHOTOINDUCED 
ELECTRON-TRANSFER;  14  SELF-ASSEMBLED  MONOLAYERS;  13  MOLECULAR 
CALCULATIONS;  13  RIBONUCLEOTIDE  REDUCTASE;  13  SOLID-STATE  NMR; 

TABLE  llA  TITLE  TRIPLE  WORD  FREQUENCIES  -  JACS 

13  USA  NUCLEAR  MAGNETIC- RE SONANCE ;  13  USA  TRANSITION-METAL 
COMPLEXES;  11  COMPUTAT  QUANTUM  CHEM;  11  USA  DIELS-ALDER 
REACTIONS;  10  ABSOLUTE  RATE  CONSTANTS;  10  SYNTHESIS  AND 
CHARACTERIZATION;  10  USA  CONVERGENT  FUNCTIONAL-GROUPS;  9 
EFFECTIVE  CORE  POTENTIALS;  9  SYNTHESIS  AND  STRUCTURE;  9  USA 
MOLECULAR-ORBITAL  METHODS;  8  PREPARATION  AND  CHARACTERIZATION; 

7  KINETIC  ISOTOPE  EFFECTS;  7  POTENT  ANTITUMOR  ANTIBIOTICS; 

6  C-H  BOND  ACTIVATION;  6  ENHANCED  FUNCTIONAL  ANALOGS;  6  USA 
METAL- PROMOTED  CYCLIZATION;  6  USA  MOLECULAR  MECHANICS;  6  USA 
RAY  CRYSTAL-STRUCTURE;  5  ATOMIC  BASIS  SETS;  5,  BASIS  SETS 
FIRST-ROW;  5  GAUSSIAN  BASIS  FUNCTIONS;  5  IRON  III  COMPLEXES; 

5  MARCUS  INVERTED  REGION;  5  MOLECULAR  MECHANICS  CALCULATIONS; 

5  NONCOVALENT  BINDING  SELECTIVITY;  5  PREPARATION  AND 
PROPERTIES;  5  SETS  FIRST-ROW  ATOMS;  5  STRUCTURE  AND 
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REACTIVITY;  5  SYNTHESIS  AND  REACTIVITY;  5  USA 

MOLECULAR-HYDROGEN  COMPLEXES;  4  AB-  INITIO  CALCULATIONS;  4 

AB-INITIO  MOLECULAR-ORBITAL  STUDY;  4  ALPHA  BETA-  UNSATURATED; 

4  ANTITUMOR  ANTIBIOTIC  CC-1065;  4  ASYMMETRIC  TOTAL  SYNTHESIS; 

4  BRIDGED  TETRAHYDROINDENYL  LIGANDS;  4  CHIRAL  TITANOCENE 

CATALYST;  4  CONICAL  INTERSECTIONS  IDENTICAL;  4  DENSITY 

FUNCTIONAL  THEORY;  4  ELECTROCHEMISTRY  OF  SPONTANEOUSLY;  4 

ENGLAND  POTENTIAL-ENERGY  SURFACES;  4  ESCHERICHIA-COLI 

RIBONUCLEOTIDE  REDUCTASE;  4  EXPERIMENTAL  AND  THEORETICAL;  4 

EXPERIMENTAL  AND  THEORETICAL-STUDY;  4  INTERSECTIONS  IDENTICAL 

NUCLEI;  4  KINETIC  AND  MECHANISTIC;  4  MECHT^ISM  OF  ASSEMBLY; 

4  MOLECULAR-  STRUCTURE  CRYSTAL-STRUCTURE;  4  OPENING  METATHESIS 
POLYMERIZATION;  4  OXYGEN  ATOM  TRANSFER;  4  PHOTOINDUCED  CHARGE 
TRANSFER;  4  PHOTOSYNTHETIC  REACTION  CENTER;  4  PLATINUM  II 

COMPLEXES;  4  SCANNING  TUNNELING  MICROSCOPY;  4  SOLIDE  INORGAN 

MOLEC;  4  SPONTANEOUSLY  ADSORBED  MONOLAYERS; 

TABLE  12A  TITLE  SINGLE  WORD  FREQUENCIES  -  JACS 

2218  CHEM;  2042  USA;  613  COMPLEXES;  393  SYNTHESIS;  274  JAPAN; 

274  REACTIONS;  237  CHEMISTRY;  213  BIOCHEM;  195  STRUCTURE; 
189  DERIVATIVES;  183  COMPLEX;  182  REACTION;  177  SPECTROSCOPY; 

168  CANADA;  166  NY;  165  MECHANISM;  159  NMR;  153  BINDING; 

153  MOLECULAR;  150  MA;  148  GERMANY;  146  MOLEC;  145  ORGAN; 

138  ACID;  136  II;  132  IL;  129  GAS-PHASE;  127  CAMBRIDGE;  127 
FAC;  12  6  BOND;  12  6  PHYS;  12  4  DNA;  123  MOLECULES;  121 

LIGANDS;  120  MODEL;  120  RESONANCE;  116  REACTIVITY;  115 
FRANCE;  115  TX;  115  VOL;  114  FORMATION;  114  IONS;  113  CO; 
113  CRYSTAL-STRUCTURE;  111  STUDY;  110  WATER;  107  RECOGNITION; 
107  SCH;  105  CHARACTERIZATION;  104  PROTEINS;  101  KU; 

TABLE  13A  ABSTRACT  DOUBLE  WORD  FREQUENCIES  -  JACS 

356  KCAL  MOL;  165  AB  INITIO;  139  SPACE  GROUP;  117  RATE 
CONSTANTS;  84  TRANSITION  STATE;  81  H-1  NMR;  80  KJ  MOL;  73 
ELECTRON  TRANSFER;  71  ANGSTROM  BETA;  67  X-RAY  DIFFRACTION; 

64  NMR  SPECTROSCOPY;  64  ROOM  TEMPERATURE;  63  GROUND  STATE; 

62  AQUEOUS  SOLUTION;  57  GROUP  P2 ;  56  FREE  ENERGY;  55 

HYDROGEN  BONDS;  55  INITIO  CALCULATIONS;  55  MOLECULAR  ORBITAL; 
53  CHEMICAL  SHIFT;  53  CRYSTAL  STRUCTURE;  53  POTENTIAL  ENERGY; 
53  PROTON  TRANSFER;  53  RATE  CONSTANT;  52  ANGSTROM  ALPHA; 

52  DOUBLE  BOND;  51  HYDROGEN  BONDING;  50  DOUBLE  DAGGER;  49 
DEGREES  BETA;  49  GAS  PHASE;  47  DEGREES  GAMMA;  46  ISOTOPE 

EFFECTS;  44  EXCITED  STATE;  43  CRYSTAL  STRUCTURES;  42 

HYDROGEN  BOND;  42  RADICAL  CATION;  40  ACTIVE  SITE;  40  SOLID 
STATE;  40  TRANSITION  STATES;  39  FE  III;  39  GOOD  AGREEMENT; 
39  MASS  SPECTROMETRY;  39  MONOCLINIC  SPACE;  39  NMR  SPECTRA; 
38  CHEMICAL  SHIFTS;  38  FORCE  FIELD;  38  MOLECULAR  MECHANICS; 
35  ISOTOPE  EFFECT;  35  TEMPERATURE  DEPENDENCE;  34  BASE  PAIRS; 
3  4  C-13  NMR;  34  HYDROGEN  ATOM;  33  ENERGY  SURFACE;  33 

EXPERIMENTAL  DATA;  33  RADICAL  CATIONS;  32  ACTIVATION  ENERGY; 
32  AMINO  ACID;  31  BASIS  SETS;  31  DNA  CLEAVAGE;  31 
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ELECTRONIC  STRUCTURE;  31  ET  AL;  31  MOLECULAR  DYNAMICS;  31 
RING  OPENING;  30  IRON  III;  30  KINETIC  ISOTOPE;  30  PREVIOUSLY 
REPORTED;  30  X-RAY  CRYSTALLOGRAPHY;  29  METAL  IONS;  28  DELTA 
DELTA;  28  EPR  SPECTRA;  28  SIDE  CHAIN;  27  ELECTRON  DENSITY; 

27  EXCITED  STATES;  27  ORBITAL  CALCULATIONS;  27  RESONANCE 

RAMAN;  26  BASIS  SET;  26  CRYSTAL  DATA;  26  FREE  ENERGIES; 

2  6  SIDE  CHAINS;  2  6  VIBRATIONAL  FREQUENCIES;  25  FE  II;  2  5 

INITIO  MOLECULAR;  25  INTERSYSTEM  CROSSING; 

TABLE  14A  ABSTRACT  TRIPLE  WORD  FREQUENCIES  -  JACS 

57  SPACE  GROUP  P2 ;  53  AB  INITIO  CALCULATIONS;  38  MONOCLINIC 

SPACE  GROUP;  35  LEVEL  OP  THEORY;  29  POTENTIAL  ENERGY  SURFACE; 

27  MOLECULAR  ORBITAL  CALCULATIONS;  25  AB  INITIO  MOLECULAR;  23 
KINETIC  ISOTOPE  EFFECTS;  22  H-1  NMR  SPECTROSCOPY;  22  INITIO 

MOLECULAR  ORBITAL;  21  TRICLINIC  SPACE  GROUP;  17  ION  CYCLOTRON 
RESONANCE;  17  MOLECULAR  MECHANICS  CALCULATIONS;  17  VAN  DER 

WAALS;  15  AGREEMENT  WITH  EXPERIMENT;  15  H-1  NMR  SPECTRA;  15 
INTERPRETED  IN  TERMS;  14  AB  INITIO  METHODS;  14  DETERMINED  BY 
X-RAY;  14  ELECTRON  PARAMAGNETIC  RESONANCE;  14  EXPLAINED  IN 

TERMS;  13  KCAL  MOL  RESPECTIVELY;  13  SINGLE-CRYSTAL  X-RAY 

DIFFRACTION;  13  SPACE  GROUP  C2 ;  12  AGREEMENT  WITH  EXPERIMENTAL; 

12  CP  RH  CO;  12  DENSITY  FUNCTIONAL  THEORY;  12  DISCUSSED  IN 
TERMS;  12  LASER  FLASH  PHOTOLYSIS;  12  LEVELS  OF  THEORY;  12 
NUCLEAR  MAGNETIC  RESONANCE;  12  SECOND-ORDER  RATE  CONSTANTS;  11 
DELTAH  DOUBLE  DAGGER;  11  H-1  AND  C-13;  11  HEATS  OP  FORMATION; 

11  ORDERS  OP  MAGNITUDE;  11  ORTHORHOMBIC  SPACE  GROUP;  11 

SYSTEM  SPACE  GROUP;  10  AB  INITIO  QUANTUM;  10  AMINO  ACID 

RESIDUES;  10  CALF  THYMUS  DNA;  10  CHARACTERIZED  BY  X-RAY;  10 
DELTAS  DOUBLE  DAGGER;  10  FOURIER  TRANSFORM  ION;  10  HYDROGEN 

ATOM  TRANSFER;  10  MOLECULAR  DYNAMICS  SIMULATIONS;  10  PREPARED 
AND  CHARACTERIZED;  10  SINGLE  AND  DOUBLE;  10  SINGLE  CRYSTAL 

X-RAY;  10  TRANSFORM  ION  CYCLOTRON;  10  X-RAY  CRYSTAL  STRUCTURES; 

TABLE  15A  ABSTRACT  SINGLE  WORD  FREQUENCIES  -  JACS 

7  92  REACTION;  710  ANGSTROM;  62  0  TWO;  617  DEGREES;  583 
COMPLEXES;  526  BOND;  506  STRUCTURE;  500  ENERGY;  498  COMPLEX; 

485  MOL;  479  OBSERVED;  465  CO;  444  GROUP;  424  STATE;  416 
FOUND;  412  FORMATION;  398  REACTIONS;  371  KCAL;  367  NMR;  354 
CALCULATIONS;  354  MOLECULAR;  346  BINDING;  344  DATA;  339  RATE; 

332  ELECTRON;  331  ACID;  327  FORM;  321  II;  319  STRUCTURES; 
306  ION;  297  RING;  297  TRANSFER;  293  RADICAL;  292  HYDROGEN; 

290  EFFECTS;  288  DETERMINED;  288  SOLUTION;  287  SIMILAR;  285 

SPECTRA;  283  DELTA;  278  MODEL;  267  TEMPERATURE;  264  ADDITION; 

2  62  MOLECULES;  259  SPECIES;  258  DNA;  253  COMPOUNDS;  253 

METAL;  251  TRANSITION;  250  BETA;  247  IONS;  246  ANALYSIS; 
246  SURFACE;  237  VALUES;  229  CONSTANTS;  229  LIGAND;  228 

SOLVENT;  227  WATER;  226  EFFECT;  226  PRODUCTS;  225  PH;  223 
GROUPS;  222  MECHANISM;  221  CRYSTAL;  220  FE;  220  X-RAY;  217 
STUDIED;  216  INTERACTIONS;  215  ENERGIES;  215  STUDIES;  214 
CHEMICAL;  214  FORMED;  212  HIGH;  211  RESPECTIVELY;  210 
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EXPERIMENTAL;  209  INTERMEDIATE;  207  CALCULATED;  204  RELATIVE; 

203  CORRESPONDING;  200  ALPHA; 

THEME  RELATIONSHIPS 

To  obtain  the  theme  and  subtheme  relationships,  a  phrase 
proximity  analysis  is  performed  about  each  theme  phrase.  Typically, 
forty  to  sixty  multi-word  phrase  themes  are  selected  from  a  multi¬ 
word  phrase  analysis  of  the  type  shown  above.  For  each  theme  phrase, 
the  frequencies  of  words  within  +-50  words  of  the  theme  phrase  for 
every  occurrence  in  the  full  text  are  computed.  A  phrase  frequency 
dictionary  is  constructed  which  shows  the  phrases  closely  related  to 
the  theme  phrase.  Numerical  indices  are  employed  to  quantify  the 
strength  of  this  relationship.  Both  quantitative  and  qualitative 
analyses  of  each  phrase  frequency  dictionary  (hereafter  called 
cluster)  yield  those  subthemes  closely  related  to  the  main  cluster 
theme . 

Then,  threshold  values  are  assigned  to  the  numerical  indices. 
These  indices  are  used  to  filter  out  the  most  closely  related  phrases 
to  the  cluster  theme  (e.g.,  see  the  example  (TABLE  16-CITATION- 
ABSTRACT  DATABASE)  following  this  section  for  part  of  a  typical 
filtered  cluster  from  the  study) . 

Because  of  space  limitations  in  this  section,  only  two  themes 
were  chosen  for  the  RIA  phrase  proximity  analysis,  and  one  theme  for 
the  JACS  phrase  proximity  analysis.  Peer  review  was  one  obvious  high 
frequency  RIA  theme.  Citation  was  chosen  as  the  other  RIA  theme 
because  of  its  high  frequency,  although  Bibliometrics  could  have  been 
an  appropriate  alternate  theme.  Complexes  was  chosen  as  the  JACS 
theme,  while  Reaction  could  have  been  an  equally  appropriate  theme. 

The  full  text  database  was  split  into  two  databases.  One  was 
the  abstract  narrative,  and  it  was  hoped  that  performing  the  phrase 
proximity  analysis  on  this  database  would  yield  mainly  topical  theme 
relationships.  The  other  database  consisted  of  records  (one  for  each 
published  paper)  containing  four  fields:  author(s) ,  title,  journal 
name,  author(s)  institutional  address(es).  It  was  hoped  that 
performing  the  phrase  proximity  analysis  on  this  database  would  yield 
not  only  topical  theme  relationships  from  the  proximal  title  phrases, 
but  also  relationships  between  technical  themes  and  authors, 
journals,  and  institutions. 


TABLE  16 

Theme  phrase  "CITATION"  -  ABSTRACT  DATABASE  -  SORT  BY  Eij 

•Cx^...Cl.......«Ii........I^.......El^..... CLUSTER . MEMBER 

. (Cij/Ci) . . (Ci j/Cj ) . . (Ii*Ij ) . . . . 

.150.  ..386 . 0.389 . 0.389 - 0.1510..  .CITATION.  . 

.137.  ..368 . 0.372 . 0.3  55 - 0.1321.  .  .JOURNALS.  . 

.10  6.  ..246 . 0.431 . 0.275...  .0.1183.  .  .CITATIONS. 

.107 - 2  68 . 0.399 . 0.277 - 0.1107..  .JOURNAL.  .  . 

..94.  ..236 . 0.398 . 0.244 - 0.0  97  0..  .PAPERS _ 
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.  .  65.  .  .  115 . 0.565 . 0. 168.  ..  .0.0952  .  .  .INDEX . 

.  .70.  .  .14  5 . 0.483 . 0.181 _ 0.0875.  .  .CITED . 

.  .91.  .  .269 . 0.338 . 0.23  6 _ 0.0798.  .  .ARTICLES.  . 

..93.  ..313 . 0.297 . 0.241 _ 0.0716..  .SCIENCE.  .  . 


CODE: 

Cij  IS  CO-OCCURRENCE  FREQUENCY,  OR  NUMBER  OF  TIMES  CLUSTER  MEMBER 
APPEARS  WITHIN  +-50  WORDS  OF  CLUSTER  THEME  IN  TOTAL  TEXT; 

Ci  IS  ABSOLUTE  OCCURRENCE  FREQUENCY  OF  CLUSTER  MEMBER; 

Cj  IS  ABSOLUTE  OCCURRENCE  FREQUENCY  OF  CLUSTER  THEME; 

li,  THE  CLUSTER  MEMBER  INCLUSION  INDEX,  IS  RATIO  OF  Cij  TO  Ci; 

lj,  THE  CLUSTER  THEME  INCLUSION  INDEX,  IS  RATIO  OF  Cij  TO  Cj , 

AND  Eij,  THE  EQUIVALENCE  INDEX,  IS  PRODUCT  OF  INCLUSION  INDEX  BASED 
ON.  CLUSTER  MEMBER  li  (Cij/Ci)  AND  INCLUSION  INDEX  BASED  ON  CLUSTER 
THEME  Ij  (Cij/Cj). 

In  the  following  figures,  the  underlined  topic  is  the  cluster 
theme.  The  cluster  members  were  segregated  by  their  values  of 
Inclusion  Indices  (Ij  and  li)  ,  but  due  to  space  limitations,  only  the 
summary  relational  results  are  presented.  Ij  is  the  ratio  of  Cij  to 
Cj ,  and  is  the  Inclusion  Index  based  on  the  theme  phrase.  li  is  the 
ratio  of  Cij  to  Ci,  and  is  the  Inclusion  Index  based  on  the  cluster 
member.  The  dividing  points  between  high  and  low  Ij  and  li  are  the 
middle  of  the"knee"  of  the  distribution  functions  of  numbers  of 
cluster  members  vs.  values  of  Ij  and  li.  All  cluster  members  with  Ij 
greater  than  or  equal  to  0.1  were  defined  as  having  high  Ij .  All 
cluster  members  with  li  greater  than  or  equal  to  0.5  were  defined  as 
having  high  li. 

A  high  value  of  Ij  means  that,  whenever  the  theme  phrase  appears 
in  the  text,  there  is  a  high  probability  that  the  cluster  member  will 
appear  within  +-50  words  of  the  theme  phrase.  A  high  value  of  li 
means  that,  whenever  the  cluster  member  appears  in  the  text,  there  is 
a  high  probability  that  the  theme  phrase  will  appear  within  +-50 
words  of  the  cluster  member. 

Phrases  in  the  category  HIGH  Ij  HIGH  li  are  coupled  very 
strongly  to  the  theme  phrase.  Whenever  the  theme  phrase  appears, 
there  is  a  high  probability  that  the  cluster  member  will  be 
physically  close.  Whenever  the  cluster  member  appears,  there  is  a 
high  probability  that  the  theme  phrase  will  be  physically  close. 
Whenever  either  word  appears  in  the  text,  the  other  will  be 
physically  close. 

Consider  phrases  located  under  the  heading  HIGH  Ij  LOW  li  in 
Tables  17  and  18.  Whenever  the  cluster  member  appears  in  the  text, 
there  is  a  low  probability  that  it  will  be  physically  close  to  the 
theme  phrase.  Whenever  the  theme  phrase  appears  in  the  text,  there 
is  a  high  probability  that  it  will  be  physically  close  to  the  cluster 
member.  This  type  of  situation  occurs  when  the  frequency  of 
occurrence  of  the  cluster  member  Ci  is  substantially  larger  than  the 
frequency  of  occurrence  of  the  theme  phrase  Cj ,  and  the  cluster 
member  and  the  theme  phrase  have  some  related  meaning. 

Single  word  phrases  have  absolute  frequencies  of  an  order  of 
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magnitude  higher  than  double  word  phrases.  Thus,  the  phrases  under 
the  heading  HIGH  Ij  LOW  li  are  typically  high  frequency  single  words. 
They  are  related  to  the  theme  phrase  but  much  broader  in  meaning  than 
the  theme  phrase.  A  small  fraction  of  the  time  that  these  broad 
single  words  appear,  the  more  narrowly  defined  double  word  phrase 
theme  will  appear  physically  close.  However,  whenever  the  narrowly 
defined  double  word  phrase  theme  appears,  the  broader  related  single 
word  cluster  member  will  appear.  The  phrases  under  this  heading  can 
also  be  viewed  as  a  higher  level  taxonomy  of  technical  disciplines 
related  to  the  theme. 

Consider  phrases  located  under  the  heading  LOW  Ij  HIGH  li. 
Whenever  the  cluster  member  appears  in  the  text,  there  is  a  high 
probability  that  it  will  be  physically  close  to  the  theme  phrase. 
Whenever  the  theme  phrase  appears  in  the  text,  there  is  a  low 
probability  that  it  will  be  physically  close  to  the  cluster  member. 
This  type  of  situation  occurs  when  the  frequency  of  occurrence  of  the 
cluster  member  Ci  is  substantially  smaller  than  the  frequency  of 
occurrence  of  the  theme  phrase  Cj ,  and  the  cluster  member  and  the 
theme  phrase  have  some  related  meaning.  Thus,  the  phrases  under  the 
heading  LOW  Ij  HIGH  li  tend  to  be  low  frequency  double  and  triple 
word  phrases,  related  to  the  theme  phrase  but  very  narrowly  defined. 

A  large  fraction  of  the  time  that  these  very  narrow  double  and 
triple  word  phrasesappear ,  the  relatively  broader  double  word  phrase 
theme  will  appear  physically  close.  However,  a  small  fraction  of  the 
time  that  the  relatively  broad  double  word  phrase  theme  appears,  the 
more  narrow  double  and  triple  word  phrase  cluster  member  will  appear. 
This  grouping  has  the  potential  for  identifying 
"needle-in-a-haystack"  type  thrusts  which  occur  infrequently  but 
strongly  support  the  theme  when  they  do  occur.  One  of  many 
advantages  of  full  text  over  key  or  index  words  is  this  illustrated 
ability  to  retain  low  frequency  but  highly  important  phrases,  ^since 
the  key  word  approach  ignores  the  low  frequency  phrases. 

TABLE  17  -  RIA 
PEER  REVIEW 


The  first  grouping  analyzed  is  the  BLOCK  database;  low  li  high 
Ij  .  The  words  describe  the  more  generic  associations  with  PEER 
REVIEW.  The  major  journals  whose  RIA  articles  tend  to  focus  on  peer 
review  are  shwn  to  include  SCIENCE,  NATURE,  and  BEHAVIORAL  AND  BRAIN 
SCIENCES.  The  major  countries  associated  with  peer  review  are  USA 
and  ENGLAND.  The  major  users  of  peer  review  in  this  database  tend  to 
represent  the  medical  community  (MEDICAL;  MEDICAL  ASSOCIATION;  SCH 
MED;  MD) .  In  summary,  peer  review  has  major  emphasis  in  America  and 
England,  is  featured  in  the  major  journals  of  Science,  Nature,  and 
Behavioral  and  Brain  Sciences,  and  is  employed  widely  in  the  medical 
community. 

The  second  grouping  analyzed  is  the  BLOCK  database;  high  li  low 
Ij .  The  words  describe  the  more  specific  associations  with  PEER 
REVIEW.  Authors  who  focus  on  peer  review  are  shown  to  include 
CHUBIN,  HACKETT,  CICCHETTI,  RUBIN,  TRACEY,  LOCK,  and  DICKSON. 
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Journals  closely  associated  with  peer  review  in  this  database  include 
JOURNAL  OF  CHILD  NEUROLOGY,  TECHNOLOGY  REVIEW,  JOURNAL  OF  PSYCHIATRY, 
ANGEWANDTE  CHEMIE  INTERNATIONAL,  and  BEHAVIORAL  AND  BRAIN  SCIENCES. 
Institutions  which  appear  often  with  peer  review  include  JOHNS 
HOPKINS  UNIV,  YALE  UNIV,  SUNY-STONY  BROOK,  and  NEW  ZEALAND  UNIV. 
Subthemes  related  to  peer  review  include  REFORM  OPTIONS,  MANUSCRIPT 
AND  GRANT  SUBMISSIONS,  INTERNAL  AND  EXTERNAL  STANDARDS,  SCIENCE 
POLICY,  PERFORMANCE  REVIEW,  REFEREES,  QUALITY  ASSESSMENT,  QUALITY 
ASSURANCE,  and  RELIABILITY. 

The  third  grouping  analyzed  is  the  ABSTRACT  database;  low  li 
high  Ij .  The  generic  related  themes  from  this  database  include  the 
validity  of  the  peer  review  process  (PROCESS,  CRITERIA,  QUALITY, 
OBJECTIVE,  RELIABILITY) ,  the  journal  focus  of  peer  review 
(MANUSCRIPTS,  AUTHORS,  JOURNALS,  ARTICLES,  EVALUATION),  and  the 
medical  focus  of  peer  review  (HOSPITAL,  HEALTH,  MEDICAL,  CLINICAL) . 

The  fourth  grouping  analyzed  is  the  ABSTRACT  database;  high  li, 
low  Ij  .  Specific  themes  include  those  related  to  process  performance 
and  quality  (DEFICIENCIES,  GRIEVANCES,  BLINDED  PEER  REVIEW,  NON- 
BLINDED  PEER  REVIEW,  FOG  INDEX,  CONTROL  GROUP,  SHORTCOMINGS, 
READIBILITY) ,  those  related  to  the  uses  and  purposes  of  peer  review 
(RESEARCH  SELECTION,  IMPACT  EVALUATION,  QUALITY  ASSESSMENT, 
OVERSIGHT,  AUDIT,  RESEARCH  IMPACT) ,  those  related  to  the  focus  on 
selecting  journal  publications  (EDITORIAL  PROCESSES,  SELECTION 
REVIEW,  PUBLISHED  IN  JOURNALS,  MANUSCRIPTS) ,  and  those  related  to  the 
medical  focus  (TRAUMA  CENTER,  AMBULATORY  CARE,  CAESAREAN  SECTIONS, 
MEDICARE,  PRIMARY  CARE,  PERINATAL). 

CITATION 


The  fifth  grouping  analyzed  is  the  BLOCK  database;  low  li  high 
I j  .  The  words  describe  the  more  generic  associations  with  CITATION. 
The  major  countries  appear  again  to  be  the  USA  and  ENGLAND;  The  major 
journal  appears  to  be  CURRENT  CONTENTS,  the  major  author  appears  to 
be  GARFIELD,  and  the  major  institution  appears  to  be  INST-SCI- 
INFORMAT.  These  results  show  the  sensitivity  of  the  conclusions  to 
the  theme  phrases  chosen  for  the  proximity  analysis.  The  inclusion 
of  citation  classic  commentaries  in  the  database  gave  heavy  weighting 
to  CURRENT  CONTENTS  in  which  they  appeared.  Had  BIBLIOMETRICS  been 
chosen  as  a  theme  word,  then  in  addition  journals  such  as 
SCIENTOMETRICS  would  have  appeared  prominently,  as  would  institutions 
such  as  HUNGARIAN  ACADEMY  OF  SCIENCES  and  CHI-RES-INC,  and  authors 
such  as  NARIN  and  BRAUN. 

The  sixth  grouping  analyzed  is  the  BLOCK  database;  high  li  low 
Ij  .  The  words  describe  the  more  specific  associations  with  CITATION. 
The  authors  closely  associated  with  citations  include  GARFIELD, 
BURCHINSKY,  DUPLENKP,  HARGENS ,  WELLJAMSDOROF,  and  BOTT.  The  journals 
associated  with  citations  include  AMERICAN  PSYCHOLOGIST,  METEORITICS, 
CHEMICKE  LISTY,  SOUTH  AFRICAN  JOURNAL  OF  SCIENCE,  AMERICAN  JOURNAL  OF 
ROENTGENOLOGY,  SCIENCE  TECHNOLOGY  AND  HUMAN  VALUES.  Institutions 
associated  with  citations  include  INST-SCI-INFORMAT,  INST  GERONTOL- 
KIEV,  UNIV  OF  ILLINOIS,  and  UNIV  OF  MICHIGAN.  Subthemes  related  to 
citation  include  COUNTS,  RATES  FREQUENCY,  RANKINGS,  INDEXES,  LINKS, 
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HIGH  IMPACT  RESEARCH,  IMPACT  FACTOR,  JOURNAL  ARTICLES,  PUBLICATIONS, 
CHAPTERS . 

The  seventh  grouping  analyzed  is  the  ABSTRACT  database;  low  li 
high  I j  .  The  generic  related  themes  from  this  database  include  types 
of  documents  cited  (PAPERS,  ARTICLES,  PUBLICATIONS),  characterization 
of  material  cited  (RESEARCH,  SCIENCE,  LITERATURE) ,  and  yields  from 
citations  (ANALYSIS,  PATTERNS,  INFORMATION,  DATA). 

The  eighth  grouping  analyzed  is  the  ABSTRACT  database;  high  li, 
low  Ij .  Specific  themes  include  those  related  to  citation  focus 
areas  (CITATION  INDEX  DATABASE,  CITATION  MATRIX,  CITATION  STUDIES, 
CITATION  RATE,  CITATION  COUNTS,  JOURNAL  CITATION  REPORTS,  MEDIAN 
CITATION,  CITATIONS  PER  ARTICLE,  CITATION  HISTORY,  CITATION  PROCESS, 
CITATION  RETRIEVAL,  CITATION  IMPACT,  CITATION  FREQUENCY) ,  citation 
analysis  techniques  (MEDIAN  CITATION,  MEAN  CITATION,  AVERAGE 
CITATION,  MEAN  VALUE  FUNCTION,  BIBLIOGRAPHIC  COUPLING,  ANALYSIS  OF 
CITATIONS,  LOGLINEAR,  POISSON  PROCESS,  RELATIVE  INDICATORS,  COUNTS, 
COCITATION),  outputs  of  citation  techniques  (RESEARCH  FRONTS,  HIGHLY 
CITED  PAPERS,  MAPPINGS) ,  and  specific  technical  areas  analyzed 
(DERMATOLOGY,  RADIOLOGY,  HEART  DISEASE,  MARINE  BIOLOGY,  SAFETY  SEATS, 
CAPITAL  PUNISHMENT,  AND  ASTRONOMERS) . 

TABLE  18  -  JACS 
COMPLEXES 


The  first  grouping  analyzed  is  the  BLOCK  database;  low  li  high 
Ij  .  The  words  describe  the  more  generic  associations  with  COMPLEXES. 
The  major  countries  associated  with  research  into  COMPLEXES  are  USA, 
JAPAN,  CANADA,  ITALY,  FRANCE,  GERMANY,  SPAIN,  ENGLAND,  and 
SWITZERLAND.  The  major  states  in  the  US  associated  with  research 
into  COMPLEXES  are  NY,  MA,  CA,  IL,  MO,  DE,  GA,  PA,  TX,  NJ,  MI,  FL, 
and  MN.  The  major  research  institutions  associated  with  COMPLEXES 
are  STANFORD,  BERKELEY,  EMORY,  CALTECH,  DELAWARE,  DUPONT,  and 
NORTHWESTERN.  The  major  types  of  COMPLEXES  researched  include 
TRANSITION-METAL,  IRON,  RUTHENIUM,  MOLYBDENUM,  RHODIUM,  TUNGSTEN,  and 
PALLADIUM.  The  major  analytical  techniques  associated  with  COMPLEXES 
include  X-RAY,  SPECTROSCOPY,  NMR,  and  MASS-SPECTROMETRY .  The  major 
phenomena  researched  associated  with  COMPLEXES  include  SYNTHESIS, 
REACTIONS,  CRYSTAL  STRUCTURE,  REACTIVITY,  ELECTRON  TRANSFER, 
ACTIVATION,  POLYMERIZATION,  CLUSTERS,  CATALYSIS,  OXIDATION,  BINDING, 
and  INSERTION. 

The  second  grouping  analyzed  is  the  BLOCK  database;  high  li  low 
Ij  .  The  words  describe  the  more  specific  associations  with 
COMPLEXES.  Organizations  closely  associated  with  COMPLEXES  research 
include  SEARLE,  HOKKAIDO-UNIV,  KYOTO-UNIV,  UNIV-PARMA,  MERCK-SHARP, 
LOS -ALAMOS -NATL- LAB,  UNIV-LAUSANNE ,  EMORY-UNIV,  UNIV-DELAWARE , 
BERKELEY,  UNIV-BARCELONA,  UNIV-STRASBOURG ,  UNIV-SYDNEY,  UNIV- 
MISSOURI,  UNI V- ZARAGOZA,  TEXAS  A&M,  UNIV-CHICAGO ,  UNIV-FLORIDA,  AND 
BROOKHAVEN-NATL-LAB.  Authors  closely  associated  with  COMPLEXES 
include  SOLARI-E,  FLORIANI-C,  HEINEKEY-DM,  COLLMAN-JP,  and  GOULD-IR. 

This  grouping  clearly  emphasizes  an  institutional  focus  of  where 
research  is  conducted.  Both  industrial  concerns  and  academic 
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facilities  are  emphasized,  roughly  equally.  Significant  themes  seem 
to  be,  as  expected,  synthesis  and  characterization  of  complexes,  but 
with  a  curious  attention  to  fixing  gases  (MOLECULAR  OXYOEN,  HYDROGEN, 
OR  CARBON  MONOXIDE)  within  the  complex,  perhaps  as  one  step  in  a 
catalysis  reaction.  Indeed,  several  of  the  papers  in  this  grouping 
focus  on  'reactive'  complexes  which  could  clearly  be  related  to 
catalytic  activity.  It  is  likely  that  this  group  focuses  heavily  on 
catalysis  as  an  overall  theme. 

The  third  grouping  analyzed  is  the  ABSTRACT  database;  low  li 
high  I j .  The  generic  related  themes  from  this  data  base  include 
understanding  the  actual  structure  of  complexes,  often  by  application 
of  instrumental  techniques  (NUCLEAR  MAGNETIC  RESONANCE,  X-RAY 
DIFFRACTION,  ULTRA  VIOLET  OR  INFRARED  SPECTROSCOPY,  others),  an 
apparent  extended  examination  of  copper  and  iron  complexes,  and  a 
weak  reference  to  potential  catalysis.  There  seems  to  be  less 
emphasis  on  the  actual  formation  (synthesis)  of  the  complexes  in  this 
grouping. 

The  fourth  grouping  analyzed  is  the  ABSTRACT  database;  high  li, 
low  Ij .  Specific  themes  include  those  related  to  formation 
(synthesis)  of  a  broad  spectrum  of  metal  complexes  (focus  on  metals 
such  as  the  platinum  group,  iron,  nickel,  copper)  many  of  which 
appear  to  include  multi-metal  atom  centers,  (e.g.  Pt-Pt)  and  even 
mixed  multi-metal  atom  centers  (e.g.  Pt-Ir) ,  and  an  emphasis  on  metal 
complexes  involving  carbon  monoxide  as  a  ligand,  as  well  as  some 
emphasis  on  unusual  carbon-based  ligands  (e.g.  per  flourinated 
species) .  This  version  of  the  data  base  clearly  seems  to  focus  on 
the  chemistry  (esp.  synthesis)  of  metal  complexes. 

The  data  base  shows  that  the  most  prolific  JACS  authors  were 
Schleyer,  Rheingold,  Boger  and  Trost,  who  published  a  total  of  49 
papers  in  1994.  These  authors  published  extensively  on  focused 
themes,  research  topics  their  groups  likely  have  pursued  for  several 
years  before,  and  after,  1994.  Specifically,  Schleyer  examined  in 
depth  synthesis  complexes  of  alkali  metals  (e.g.  sodium) ,  a  very 
unusual  topic  as  alkali  metals  in  general  form  complexes  only  rarely, 
as  well  applied  computer  based  technology  to  elucidate  the  structure 
of  such  comples.  Rheingold 's  group  published  extensivly  on  complexes 
involving  metal-rutal  bonds,  and  multi-metal  atom  clusters  in 
catatopic  systems.  Boger,  alone  among  the  prolific  authors,  focused 
on  bio-active  molecules  and  their  synthesis  and  reactivity  as  a 
function  of  structure.  Trost,  and  his  associates,  appeared  to 
examine  transition  metal  catalysis  of  traditional,  well  characterized 
organic  system  reaction  such  as  the  Diehls-Alder  reaction  (which 
involves  no  metal  species) .  In  general,  it  is  clear  that  the  four 
authors  are  publishing  heavily  in  broad  areas  of  contemporary  organic 
metallic  chemistry:  synthesis  catalysis,  structure  and  mechanism 
determination,  and  metal-mediation  of  bio-active  molecules.  Indeed, 
it  is  clear  that  these  authors  are  defining  the  direction  of  these 
themes  by  their  prolific  research  and  publication  programs. 

CONCLUSIONS  AND  APPLICATIONS 

This  section  has  provided  maps  of  the  RIA  and  JACS  Chemistry 
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fields,  although  only  a  small  fraction  of  the  raw  data  has  been 
presented.  A  Competitive  Intelligence  (Cl)  professional  who  has 
interest  in  these  fields  has  many  options  for  proceeding  further  from 
the  map,  depending  on  this  person's  specific  interests.  For  example, 
if  the  analyst  wanted  to  understand  the  intellectual  foundations  of 
RIA  or  JACS  Chemistry,  then  a  reading  of  the  most  highly  cited  papers 
would  be  an  excellent  starting  point.  If  the  analyst  wanted  to 
overview  the  current  literature,  then  two  approaches  are  available. 
The  comprehensive  literature  survey  used  as  the  database  for  the  RIA 
analysis  and  reproduced  in  the  back  of  this  Handbook  is  one  avenue. 
Another  is  to  peruse  the  journals  which  contain  the  highest  frequency 
of  recent  publications.  This  latter  approach  is  worthwhile  since 
computerized  search  approaches  don't  always  identify  the  full  scope 
of  related  articles  to  the  topic  of  interest,  and  journals  which 
focus  on  such  a  topical  area  could  yield  a  cornucopia  of  useful 
information  through  browsing. 

If  the  analyst  wants  to  contact  experts  in  a  particular  thrust 
area  or  technique,  then  contact  could  be  made  with  the  specific 
individuals  or  the  institutions  identified  with  given  techniques  in 
the  theme  relationships  section.  If  the  analyst  wants  to  generate  a 
taxonomy  of  the  S&T  field  based  on  the  technical  relationships  used 
by  the  research  performers,  then  the  approaches  described  in  Appendix 
I  might  prove  helpful.  If  the  analyst  wants  to  utilize  the 
literature  to  help  identify  promising  research  directions,  then  the 
approach  described  in  Appendix  II  might  prove  useful.  The  key 
conclusion  is  that,  starting  from  the  raw  data,  the  analyst  can 
generate  any  cross-cutting  relationships  desired  to  proceed  further 
in  specific  directions  of  personal  interest. 


APPENDIX  I  -  GENERATION  OF  TAXONOMIES 
TAXONOMIES 

The  different  types  of  Database  Tomography  outputs  allow 
different  types  of  taxonomies,  or  classifications  into  component 
categories,  to  be  generated.  Such  categorizations,  anologous  to  the 
independent  axes  of  a  mathematical  coordinate  system,  allow  the 
underlying  structure  of  a  field  to  be  portrayed  more  clearly,  leading 
to  more  focused  analytical  and  management  analyses.  There  is  a  major 
difference  between  the  taxonomy  obtained  by  this  approach  and  other 
taxonomies.  The  present  taxonomy  derives  from  the  language  and 
natural  divisions  of  the  database,  and  therefore  database  entries  are 
easily  categorized.  Other  taxonomies  are  usually  generated  top-down 
and  usually  attempt  to  force-fit  database  subjects  into  pre¬ 
determined  categories. 

One  of  the  advantages  of  the  present  full  text  approach, 
relative  to  the  index  or  key  word  approach,  is  that  many  types  of 
taxonomies  can  be  generated:  i.e.,  science,  technology,  institution, 
journal,  person  name,  etc.  Even  within  one  of  these  categories,  such 
as  science,  many  types  of  taxonomies  can  be  developed,  depending  on 
the  interests  of  the  analyst  and  the  reason  for  the  taxonomy.  Two 


289 


separate  types  of  taxonomies  will  be  discussed  here. 

I  -  PHRASE  FREQUENCY  TAXONOMY 

The  first  type  of  taxonomy  derives  from  the  phrase  frequencies. 
The  authors  examined  the  phrase  frequency  outputs,  then  arbitrarily 
grouped  the  high  frequency  phrases  into  different,  relatively 
independent,  categories  for  which  all  remaining  terms  would  be 
accounted.  Two  examples  of  taxonomies  are  presented:  the  first  is 
from  a  study  of  research  papers  related  to  the  utilization  of  near- 
earth  space,  and  the  second  is  from  a  study  of  reports  from  the 
Foreign  Applied  Sciences  Assessment  Center  (FASAC)  assessing 
different  areas  of  applied  research  in  the  former  Soviet  Union. 

EXAMPLE  1  -  NEAR  EARTH  SPACE  RESEARCH  TAXONOMY 

About  5500  research  papers  relating  to  utilization  of  near  earth 
space  were  drawn  from  the  SCI.  Phrase  frequencies  were  generated 
from  the  abstracts,  and  the  high  frequency  phrases  were  arbitrarily 
categorized.  These  relatively  independent  categories  consist  of 
Space  Platform  (E.G.,  SATELLITE,  SPACECRAFT),  Satellite  Function 
(E.G.,  MAPPING,  TRACKING),  Satellite  Type  (E.G.,  GEOSAT,  LANDSAT) , 
Measuring  Instrument  (E.G.,  RADIOMETER,  MICROWAVE  LIMB  SOUNDER), 
Region  Examined  (E.G.,  SEA,  UPPER  ATMOSPHERE),  Location  Examined 
(E.G.  ,  NORTH  ATLANTIC,  SOUTHERN  HEMISPHERE)  ,  Variable  Measured  (E.G.  , 
TEMPERATURE,  SOIL  MOISTURE  CONTENT),  Variable  Derived  (E.G., 
RADIATION  BUDGET,  GENERAL  CIRCULATION  PATTERN) ,  Analytical  Tool 
(E.G.,  DATA  PROCESSING,  LEAST  SQUARES),  Products  (E.G. ,  TIME  SERIES, 
TOTAL  OZONE  MAPPING),  Space  Environment  (E.G. ,  SOLAR  WIND,  MAGNETIC 
FIELD)  . 


EXAMPLE  2  -  FORMER  SOVIET  UNION  APPLIED  RESEARCH 

About  35  full-length  reports  on  the  status  of  different  areas  of 
applied  research  in  the  Former  Soviet  Union  were  used  as  the 
database.  Phrase  frequencies  were  generated  from  the  reports,  and 
the  high  frequency  phrases  were  arbitrarily  categorized.  An  applied 
research  taxonomy  was  generated.  It  consists  of  Information  (IMAGE 
PROCESSING,  PATTERN  RECOGNITION,  SIGNAL  PROCESSING,  ARTIFICIAL 
INTELLIGENCE,  ETC.),  Physics  (SHOCK  WAVES,  RADIO  WAVES,  QUANTUM 
ELECTRON,  MAGNETIC  FIELD,  CHARGED  PARTICLE  ACCELERATORS,  OPTICAL 
PHASE  CONJUGATION,  ETC.),  Environment  (INTERNAL  WAVES,  OCEANIC 
PHYSICS,  SEA  SURFACE,  IONOSPHERIC  MODIFICATION,  RADIO  WAVE 
PROPAGATION,  ETC.),  and  Materials  (THIN  FILM,  COMPOSITE  MATERIALS, 
FRACTURE  MECHANICS,  SOLID  FUEL  CHEMISTRY,  STRENGTH  MATERIAL,  ETC.). 

II  -  PHRASE  PROXIMITY  TAXONOMY 

The  second  type  of  taxonomy  derives  from  the  phrase  frequency 
and  proximity  analysis.  From  the  phrase  frequency  analyses,  fifty  or 
sixty  high  frequency  technical  phrases  were  identified  as  pervasive 
themes.  The  next  step  was  to  group  these  high  frequency  phrases  into 
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categories  of  related  themes.  A  proximity  analysis  was  done  for  each 
of  these  high  frequency  phrases.  A  phrase  frequency  dictionary,  or 
cluster,  was  generated  for  each  phrase.  This  cluster  contained  those 
phrases  which  were  in  close  physical  proximity  to  the  pervasive  theme 
throughout  the  text.  The  degree  of  overlap  among  clusters  was 
computed.  Clusters  which  shared  more  than  a  threshold  number  of 
common  phrases  were  viewed  as  overlapping.  These  overlapping 
clusters  were  viewed  as  links  in  a  chain,  with  the  different  chains 
being  relatively  independent.  Each  chain  was  then  defined  as  a 
category  of  the  larger  taxonomy.  For  the  study  of  applied 

research  in  the  Former  Soviet  Union,  the  following  taxonomy,  or 
megacluster  grouping,  was  generated. 

The  numbered  themes  (e.g. ,  1.  IONOSPHERIC  HEATING/  MODIFICATION) 
are  the  categories,  or  megaclusters.  The  component  themes  (e.g., 
*RADIO  WAVE),  preceded  by  an  asterisk  (*),  are  the  clusters,  or 
pervasive  themes  from  the  phrase  frequency  analysis. 

1.  IONOSPHERIC  HEATING/  MODIFICATION 
*RADIO  WAVE 

*WAVE  PROPAGATION 
* QUANTUM  ELECTRON 
*IONOSPHERIC  MODIFICATION 
* PHASE  CONJUGATION 

2.  IMAGE/  OPTICAL  PROCESSING 
*PARALLEL  PROCESSING 
*PATTERN  RECOGNITION 

* IMAGE  PROCESSING 
*COMPUTER  VISION 
* DIGITAL  COMPUTER 
*ARTIFICIAL  INTELLIGENCE 
*DATA  PROCESSING 
*COMPUTER  SCIENCE 
*OPTICAL  PROCESSING 
*SPATIAL  LIGHT  MODULATOR 
*SIGNAL  PROCESSING 
*LIQUID  CRYSTAL 
* LIGHT  MODULATOR 
*PROGRAMMING  LANGUAGES 
* INTEGRAL  EQUATIONS 

3 .  AIR-SEA  INTERFACE 
* SURFACE  WAVE 
*OCEANIC  PHYSICS 

* INTERNAL  WAVE 
*SEA  SURFACE 
* BOUNDARY  LAYER 
*ATMOS  OCEANIC  PHYS 
*REMOTE  SENSING 

4 .  LOW  OBSERVABLE 
*LOW  OBSERVABLE 
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*THIN  FILM 


5.  EXPLOSIVE  COMBUSTION 
*KINETICS  AND  CATALYSIS 
*SOLID  FUEL 
*EXPLOSION  AND  SHOCK 
*SHOCK  WAVE 

* CHEMICAL  PHYSICS 
*EXPLOS  SHOCK  WAVE 
* STRENGTH  MATER 
*FRACTURE  MECHANICS 
* COMPOSITE  MATERIALS 

6 .  PARTICLE  BEAMS 
*NEUTRAL  BEAM 
*PARTICLE  ACCELERATOR 
* ATOMIC  ENERGY 
*PLASMA  PHYSICS 

* ELECTRON  BEAM 

*CHARGED  PARTICLE  ACCELERATOR 
*CHARGED  PARTICLE 

7.  AUTOMATIC/  REMOTE  CONTROL 
*AUTOMATIC  CONTROL 

*REMOTE  CONTROL 

8 .  FREQUENCY  STANDARDS 
* FREQUENCY  STANDARD 
*HYDROGEN  MASER 

9.  RADAR  CROSS  SECTION 
*CROSS  SECTION 
*ELECTROMAGNETIC  WAVE 
*RADIO  ENGINEERING 

From  the  multiword  frequency  analysis,  the  science  discipline 
taxonomy  for  the  FASAC  database  was  defined  as  Information,  Physics, 
Environment,  and  Materials.  In  terms  of  the  megaclusters. 
Information  would  encompass  IMAGE/  OPTICAL  PROCESSING  and  AUTOMATIC/ 
REMOTE  CONTROL;  Physics  would  encompass  IONOSPHERIC  HEATING/ 
MODIFICATION,  PARTICLE  BEAMS,  FREQUENCY  STANDARDS,  and  I^DAR  CROSS 
SECTION;  Ehvironment  would  encompass  AIR-SEA  INTERFACE;  and  Materials 
would  encompass  EXPLOSIVE  COMBUSTION  and  LOW  OBSERVABLE. 
Categorizing  the  database  with  the  megacluster  subcategories  allows 
a  re-interpretation  of  the  FASAC  database.  FASAC  can  be  viewed  as  a 
compendium  of  those  aspects  of  FSU  science  of  interest  to  the  U.  S. 
for  strategic  and  military  purposes  rather  than  viewed  as  a  microcasm 
of  all  of  FSU  science 
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APPENDIX  II 


IDENTIFICATION  OF  PROMISING  RESEARCH  DIRECTIONS 


INTRODUCTION 

This  Appendix  describes  a  literature-based  approach  to 
identifying  opportunity-driven  promising  directions  in  science  and 
technology.  The  method  is  generic  to  all  fields  of  endeavor  for 
which  a  literature  exists,  is  dual  use  in  the  broadest  sense,  and  has 
the  potential  to  revolutionize  how  promising  directions  are 
identified.  The  approach  is  a  computer-based  analysis  of  the  desired 
literatures  using  appropriate  experts  for  data  interpretation.  The 
proposed  procedure  offers  a  potential  quantum  improvement  over 
earlier  related  research  efforts  in  the  medical  literature  (10,  11)  . 
The  technique  would  use  the  Database  Tomography  system  described 
previously. 


BACKGROUND 

In  the  mid-1980s,  Don  Swanson  showed  that  logical  connections  in 
the  existing  medical  literature  can  be  integrated  to  help  identify 
promising  medical  research  directions  [Swanson,  1986] .  His  three 
literature-based  investigations  have  hypothesized  that  1)  dietary 
fish  oil  would  be  helpful  in  treating  Raynaud's  Disease;  2)  magnesium 
is  important  to  migraine;  and  3)  there  is  a  relationship  between 
arginine  and  Somatomedin  C.  There  has  been  medical  corroboration  of 
Swanson's  discoveries  [Gordon,  1986], 

Gordon  and  Lindsay  used  computer-based  tools  to  replicate  and 
extend  Swanson's  work  [Gordon,  1986].  A  more  detailed  summary  of 
their  work,  as  well  as  additional  improvements  possible  with  the 
authors'  approach,  is  in  the  Procedure  section  which  follows. 
Basically,  they  used  word  frequency  analysis  to  examine  the 
literature  of  interest,  they  used  the  high  frequency  words  or  phrases 
to  identify  related  intermediate  literatures,  and  then  used  a 
combination  of  high  frequency  phrases  and  weak  relations  between  the 
phrases  to  identify  the  promising  research  directions  from  the 
related  literatures. 

For  example,  they  performed  a  phrase  frequency  analysis  of  the 
Raynaud's  Disease  (RD)  literature,  and  found  that  BLOOD  VISCOSITY  was 
a  crucial  element  in  RD.  They  then  performed  a  phrase  frequency  and 
weak  phrase  proximity  (ratio  of  phrase  appearance  in  BLOOD  VISCOSITY 
literature  to  appearance  in  total  medical  literature)  analysis  of  the 
BLOOD  VISCOSITY  literature.  Their  analyses  confirmed  Swanson's 
results,  and  showed  that  FISH  OIL  and  EICOSAPENTAENOIC  ACID  (one  of 
fish  oil's  main  chemical  constituents)  offerred  substantial  promise 
as  research  directions.  Experiments  performed  subsequent  to 
Swanson's  findings  have  confirmed  these  predicitions. 

The  authors  believe  this  strong  dependence  on  high  frequency 
phrases  and  only  latter  stage  employment  of  the  weak  proximity 
condition  severely  constrains  the  technique's  potential.  Based  on 
the  authors'  database  analyses  of  the  past  five  years,  it  was  found 
that  the  strong  physical  proximity  of  phrases  in  text  is  of  equal 
importance  to  the  occurrence  frequency  of  those  phrases  when 
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constructing  structural  maps  of  science  and  technology.  In  fact,  for 
identifying  promising  research  and  technology  directions,  strong 
phrase  proximity  may  be  far  more  important  than  phrase  frequency. 
High  frequency  phrases  tend  to  reflect  both  the  obvious  and  the 
mainstream  efforts,  while  low  frequency  phrases  located  in  close 
proximity  to  phrases  of  topical  interest  have  much  greater  chance  of 
uncovering  'needles-in-a-haystack' .  In  addition,  as  was  shown  in  a 
recent  paper,  the  full  power  of  the  authors'  analytic  approach 
requires  the  use  of  both  phrase  frequency  and  strong  phase  proximity 
at  every  iterative  step  in  the  analysis  [Kostoff ,  1997f ] . 

The  author's  approach  uses  the  Database  Tomography  tools  of 
phrase  frequency  analysis  in  conjunction  with  strong  phrase  proximity 
analysis.  This  allows  identification  not  only  the  mainstream  high- 
frequency  relationships,  but  the  less-explored  low-frequency  high- 
proximity  relationships  as  well.  This  provides  the  capability  to 
identify  the  most  promising  science  and  technology  directions  with 
the  least  restrictions. 


PROCEDURE 

The  remainder  of  this  section  summarizes  Gordon  and  Lindsay's 
work  on  literature-based  discovery,  and  shows  how  the  combination  of 
Database  Tomography  and  their  approach  would  eliminate  the  major 
deficiencies  in  their  present  approach.  This  combined  approach  could 
have  tremendous  payoff  in  many  technical  and  non-technical  fields. 

The  initial  summary  of  Gordon  and  Lindsay's  work  will  focus  on 
their  example  of  Raynaud's  Disease  (RD) .  The  objective  of  their 
approach  is  to  find  something  in  the  published  literature  that  will 
point  to  new  directions  for  treating/  curing,  etc.  RD.  They  use  the 
following  approach.  Search  the  literature  (MEDLINE,  in  their 
particular  case)  to  retrieve  all  documents  which  contain  Raynaud*  in 
the  appropriate  fields  (560  documents) .  Using  word  frequency 
analysis  (including  different  types  of  word  frequency  analysis 
statistics) ,  identify  high  frequency  terms  related  to  RD. 

For  example,  they  find  BLOOD  is  such  a  term.  They  then  identify 
the  subset  of  the  Raynaud  documents  which  contain  blood-related  terms 
(BLOOD  FLOW,  BLOOD  VISCOSITY,  PLATELET  AGGREGATION,  ETC.) ,  and  repeat 
the  word  frequency  analysis  on  this  subset  (232  documents) .  They 
find  that  ideas  related  to  BLOOD  FLOW  should  be  pursued  further.  In 
particular,  they  find  that  BLOOD  VISCOSITY  is  related  to  BLOOD  FLOW, 
is  a  possible  cause  of  impaired  flow,  and  is  statistically  prominent 
in  its  own  right. 

Here  comes  a  crucial  part  of  their  approach.  They  go  back  into 
the  literature,  and  search  for  all  records  related  to  BLOOD 
VISCOSITY,  whether  or  not  they  are  related  to  RD.  After  performing 
a  word  frequency  analysis  and  a  weak  proximity  analysis  on  this 
information  retrieved,  they  prune  the  list  of  terms  to  115  which  they 
judge  to  be  initial  candidates  for  discovery.  The  details  of  the 
pruning  are  not  relevant  for  what  follows  here.  Of  the  115  terms, 
they  find  that  only  34  did  not  appear  in  the  list  of  the  original  560 
Raynauds  records.  These  34  terms  are  what  they  call  disjoint  from 
Raynauds,  and  are  therefore  true  candidates  for  discovery.  They 
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finally  arrive  at  FISH  OIL,  and  EICOSAPENTAENOIC  ACID  (one  of  fish 
oil's  main  chemical  constituents)  as  the  discovery  items. 

The  purpose  of  their  study  was  to  replicate  Swanson's  approach 
for  identifying  promising  directions  in  medical  research,  done 
without  computerized  information  retrieval  techniques,  ten  years 
earlier.  They  did  replicate,  and  they  also  show  that  follow-up 
medical  research  has  corroborated  Swanson's  discoveries.  Thus,  their 
method  and  Swanson's  appear  to  have  great  promise  in  mining  the 
medical  literature  for  promising  new  directions.  What,  then,  are  the 
deficiencies? 

Their  approach  is  based  mainly  on  word  frequency  analysis,  and 
the  use  of  high  frequency  terms  to  guide  promising  directions.  Only 
in  the  last  step  of  their  analysis  do  they  employ  a  weak  proximity 
analysis  condition.  Based  on  the  authors'  experience,  high  word 
frequencies  tend  to  reflect  mainstream  research  approaches  heavily 
published  in  the  literature.  Use  of  high  frequency  terms  at  most 
stages  of  the  analysis  will  effectively  eliminate  concepts,  accepted 
or  alternative,  which  have  received  little  support  in  the  past  and 
are  lightly  represented  in  the  literature. 

What  is  required  for  a  more  complete  computer-based  analytical 
tool  is  a  method  that  gives  equal  emphasis  to  low  frequency  terms  as 
well  as  high  frequency  terms.  In  practice,  the  low  frequency  term 
analyzer  would  probably  be  more  valuable  for  identifying  promising 
opportunities.  High  frequency  relationships  tend  to  be  more  obvious, 
and  probably  many  of  these  types  of  relationships  are  known  without 
use  of  the  computerized  analysis.  According  to  Gordon  and  Lindsay, 
Swanson  was  able  to  hypothesize  the  promising  opportunities  without 
the  use  of  the  computerized  analysis.  While  high  frequency 
relationships  are  useful  in  mapping  structural  relationships  among 
science  and  technology  disciplines,  as  has  been  shown  with  the 
Database  Tomography  efforts,  it  is  the  low  frequency  relationships 
which  have  the  greater  potential  of  finding  the  'needles  in  a 
haystack' . 

However,  while  there  are  relatively  few  high  frequency 
relationships,  and  the  analytical  problem  is  relatively  bounded, 
there  are  very  large  numbers  of  low  frequency  relationships.  The 
problem  becomes  pragmatically  intractable  if  no  further  conditions 
are  placed  on  the  low  frequency  relationships.  The  additional 
conditions  on  the  low  frequency  relationships  required  to  make  the 
problem  tractable  derive  from  the  word  proximity  analyses.  Examine 
only  those  low  frequency  terms  which  are  also  strongly  related  to  the 
dominant  themes  of  the  problem.  In  other  words,  examine  those  low 
frequency  terms  which  have  high  inclusion  indices  (number  of 
appearances  within  some  domain  around  the  dominant  term/  number  of 
appearances  in  the  total  text)  relative  to  the  dominant  terms.  Thus, 
whenever  these  low  frequency  terms  appear  in  the  text,  they  are 
located  physically  close  to  the  dominant  themes. 

The  Raynaud  example  will  now  be  used  to  show  how  Database 
Tomography  in  conjunction  with  Gordon  and  Lindsay's  method  could  have 
worked.  Using  DT,  two  major  pathways  could  have  been  examined,  where 
Gordon  and  Lindsay  examined  only  one.  For  the  first  pathway,  use 
Gordon  and  Lindsay's  database  and  replicate,  using  word  frequency 
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analysis,  that  BLOOD  VISCOSITY  appears  important.  Examine  the  BLOOD 
VISCOSITY  literature  further,  as  they  did.  Then,  do  a  word  frequency 
analysis  of  the  BLOOD  VISCOSITY  literature,  and  identify  the  high 
frequency  terms. 

At  this  point,  perfonn  a  strong  word  proximity  analysis  for 
BLOOD  VISCOSITY  on  the  retrieved  blood  viscosity  literature. 
Identify  (using  the  numerical  indicators  from  the  proximity  analysis) 
those  terms  which,  when  they  appear  in  the  blood  viscosity 
literature,  are  located  physically  close  to  BLOOD  VISCOSITY.  Thus, 
for  argument's  sake,  FISH  OIL  may  appear  100  times  in  the  blood 
viscosity  literature  (and  not  in  the  RAYNAUD*  literature;  keep  the 
requirement  of  disjointness) ,  but  in  only  30  of  those  times  does  it 
appear  physically  close  to  BLOOD  VISCOSITY.  It  would  have  an 
inclusion  index  of  30/100=. 3.  However,  a  potential  low  frequency 
temn  like  VISUALIZATION  may  appear  only  5  times  in  the  BLOOD 
VISCOSITY  literature  (again,  not  in  the  RAYNAUD*  literature) ,  but  in 
4  of  those  times  it  appears  physically  close  to  BLOOD  VISCOSITY.  It 
would  have  an  inclusion  index  of  4/5=.8. 

Then,  investigate  both  FISH  OIL  (high  frequency  and  low 
inclusion)  and  VISUALIZATION  (high  inclusion  and  low  frequency) 
further,  with  the  use  of  the  medical  experts,  for  promising  research 
directions . 

For  the  second  pathway,  perform  a  strong  word  proximity  analysis 
on  the  initial  RD  literature.  Based  on  the  results  of  this  analysis, 
define  a  promising  intermediate  literature,  analogous  to  the  BLOOD 
VISCOSITY  literature  on  the  first  pathway.  Perform  word  frequency 
and  strong  proximity  analyses  on  this  intermediate  literature,  and 
interpret  the  data  with  the  support  of  medical  experts  to  arrive  at 
(hopefully)  further  promising  research  directions. 
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IX-A. 


DATABASE  TOMOGRAPHY  FOR  INFORMATION  RETRIEVAL 


This  section  includes  contributions  from  DR.  RONAX^D  N.  KOSTOFF, 
OFFICE  OF  NAVAL  RESEARCH;  MR.  HENRY  J.  EBERHART,  NAVAL  AIR  WARFARE 
CENTER,  WEAPONS  DIVISION  CHINA  LAKE;  MR.  DARRELL  RAY  TOOTHMAN,  DSTI, 
INC. 


ABSTRACT 

This  section  describes  an  iterative  full-text  information 
retrieval  approach  using  Database  Tomography  (DT)  and  Simulated 
Nucleation  (SN) .  The  method  generates  search  terms  from  the  language 
and  context  of  the  text  authors,  and  is  sufficiently  flexible  to 
apply  to  a  variety  of  databases.  It  provides  improvement  to  the 
search  strategy  and  related  results  as  the  search  progresses,  not 
only  adding  relevant  records  to  the  information  retrieved,  but 
subtracting  non-relevant  records  as  well. 

As  shown  previously  in  sections  IV-C  and  IX,  Database  Tomography 
is  an  information  extraction  and  analysis  system  which  operates  on 
textual  databases.  Its  primary  use  to  date  has  been  to  identify 
pervasive  technical  thrusts  and  themes,  and  the  interrelationships 
among  these  themes  and  sub-themes,  which  are  intrinsic  to  large 
textual  databases.  Its  two  main  algorithmic  components  are 
multi-word  phrase  frequency  analysis  and  phrase  proximity  analysis. 

Simulated  Nucleation,  the  name  given  to  the  form  of  Database 
Tomography  adapted  to  information  retrieval,  derives  in  concept 
from  the  growth  of  materials.  In  Simulated  Nucleation  for 
information  retrieval,  a  small  core  group  of  documents  relevant  to 
the  topic  of  interest  is  identified.  The  main  algorithmic 
components  of  Database  Tomography,  phrase  frequency  and  phrase 
proximity  analyses,  operate  on  this  core  group  of  documents. 
Patterns  of  word  combinations  in  existing  fields  are  identified, 
new  search  term  combinations  which  follow  the  newly  identified 
patterns  are  generated,  and  the  process  is  repeated.  In  addition, 
patterns  of  word  combinations  which  reflect  extraneous  non-relevant 
material  are  identified,  and  search  terms  which  have  the  ability  to 
remove  non-relevant  documents  from  the  database  are  inserted. 

Thus,  Simulated  Nucleation  operates  in  a  self-correcting  cybernetic 
homeostatic  mode,  and  continually  expands  the  coverage  and  improves 
the  quality  of  the  core  database.  This  iterative  procedure 
continues  until  convergence  is  obtained,  where  relatively  few  new 
documents  are  found  or  few  non-relevant  documents  are  identified, 
even  though  new  search  terms  are  added.  An  application  is 
described  of  developing,  from  the  Science  Citation  Index,  a 
database  of  journal  articles  focused  on  near-earth  Space  Science 
and  Technology. 


INTRODUCTION 

Over  the  past  decade,  with  the  growth  and  expansion  of 
electronic  storage  media,  there  has  been  a  virtual  explosion  of 
data  readily  available.  In  particular,  use  of  CD-ROMs  and  the 
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Internet  have  provided  overwhelming  data  resources  to  the  average 
user.  Standard  database  search  approaches  tend  to  cast  either  too 
wide  a  net  or  too  fine  a  net  to  make  optimal  use  of  this 
information.  The  central  problem  with  most  standard  search 
approaches  is  that  the  analyst  hypothesizes  what  the  search  terms 
should  be  in  the  context  of  the  application,  rather  than  uses  the 
database  to  provide  the  search  terms  appropriate  to  the  context  in 
which  they  are  actually  imbedded.  Even  in  those  databases  which 
provide  an  on-line  dictionary  of  terms  used  (typically  single  word 
only) ,  the  analyst  is  unable  to  ascertain  the  context  in  which 
those  words  are  employed,  and  therefore  cannot  predict  if  the  theme 
of  the  article  targeted  by  the  search  term  used  is  the  theme 
desired.  The  search  approaches  which  try  to  approximate  context 
better  by  weighting  different  search  terms,  and  then  using  a  figure 
of  merit  to  select  documents  which  effectively  contain  more  and  a 
wider  variety  of  the  weighted  search  terms,  still  have  the 
limitations  of  being  based  on  analyst  hypotheses. 

A  high  quality  information  retrieval  approach  should  not  only 
be  able  to  yield  the  search  terms  from  the  language  and  context  of 
the  authors  rather  than  from  the  language  of  the  searcher,  but 
should  have  other  desireable  properties.  It  should  be  able  to  work 
efficiently  on  a  variety  of  different  databases.  Some  databases 
have  keywords;  most  don't.  Some  databases  provide  authors;  some 
don't.  Some  databases  provide  references;  some  don't.  Finally, 
analogous  to  a  neural  network's  operation,  a  high  quality 
information  retrieval  approach  should  be  able  to  improve  the  search 
strategy  and  results  as  time  proceeds.  This  section  describes  an 
information  retrieval  approach  which  has  these  desireable 
properties,  and  more. 

BACKGROUND 

As  part  of  a  larger  project,  the  authors  needed  to  perform  a 
literature  survey  of  published  science  and  technology  papers 
related  to  utilization  of  near-earth-space.  A  survey  of  the 
information  retrieval  literature  to  identify  an  efficient  query 
concept  produced  very  mixed  results.  Most  of  the  information 
retrieval  papers  tended  to  be  very  abstract,  contained  little 
detail  that  would  provide  guidance  in  conducting  an  actual  query, 
and  appeared  marginally  related  to  the  much  less  esoteric  methods 
used  by  actual  information  retrieval  practitioners  (librarians, 
etc.).  In  addition,  most  of  these  published  information  retrieval 
papers  focused  on  the  use  of  search  terms  derived  from  extrinsic 
sources,  rather  from  the  language  used  by  the  database  text 
authors.  The  authors  of  the  present  section  felt  that  these 
deficiencies  would  probably  limit  the  efficiency  and 
comprehensiveness  of  the  search  process  and  final  results. 

Since  the  authors  had  been  developing  database  analysis 
techniques  based  on  Term  Co-occurrences  over  the  past  few  years,  it 
was  decided  to  see  whether  these  techniques  could  be  adapted  and 
expanded  to  address  the  literature  survey  problem.  To  achieve  the 
objective  of  developing  the  high  quality  information  retrieval 
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approach  whose  properties  were  described  in  the  previous  section, 
the  Term  Co-occurrence  retrieval  component  already  developed  by  the 
authors  would  have  to  be  embedded  in  a  Relevance  Feedback  structure 
with  Query  Expansion.  An  ancillary  objective  was  to  develop  and 
articulate  this  novel  information  retrieval  approach  such  that  it 
could  easily  be  employed  by  real-world  practitioners. 

The  remainder  of  this  Background  section  outlines  the  major 
efforts  that  have  been  performed  in  Term  Co-Occurrence,  Co-Word 
Analysis,  Query  Expansion,  and  Relevance  Feedback,  and  addresses 
some  of  their  limitations.  These  four  topics  are  not  distinct,  but 
have  substantial  overlap,  and  this  is  reflected  in  the  historical 
survey . 


TERM  CO-OCCURRENCE  AND  CO-WORD  ANALYSIS 

Term  Co-occurrence  has  its  roots  in  co-word  analysis  and 
computational  linguistics.  Co-word  analysis  utilizes  the  proximity 
of  words  and  their  frequency  of  co-occurrence  in  some  domain 
(sentence,  paragraph,  paper,  etc.)  to  estimate  the  strength  of 
their  relationship.  When  applied  to  the  literature  in  a  technical 
field,  co-word  analysis  allows  a  map  of  the  relationship  among 
technical  themes  to  be  constructed.  A  history  of  co-word  analysis 
applied  to  research  policy  issues,  its  origins  in  computational 
linguistics,  and  its  limitations  due  to  previous  dependence  on  the 
sole  use  of  key  words  and  index  words,  can  be  found  in  recent 
review  articles  and  references  [Kostoff,  I991d,  1992a,  1993c]. 

TERM  CO-OCCURRENCE  AND  QUERY  EXPANSION 

Term  Co-occurrence  in  information  retrieval  can  be  used  to 
expand  on  an  initial  query,  and  the  additional  query  terms  allow 
the  retrieval  of  relevant  documents  that  would  not  have  been 
retrieved  with  the  initial  query.  These  additional  terms  could 
also  be  used  to  remove  irrelevant  documents.  Traditionally,  this 
form  of  query  expansion  has  been  carried  out  by  means  of  thesauri 
and  controlled  vocabularies.  The  construction  of  these  is  time- 
consuming  and  expensive.  Additionally,  these  extra  terms  are 
analyst-generated,  rather  than  generated  from  the  phrases  used  in 
the  searched  database. 

Studies  related  to  the  use  of  Term  Co-occurrence  in 
information  retrieval  can  be  traced  back  to  at  least  the  1960s 
[Maron,  1960;  Stiles;  1961;  Lesk,  1969].  These  early  experiments 
demonstrated  the  potential  of  Term  Co-occurrence  data  for  the 
identification  of  search  term  variants,  and  eventually  led  to  the 
conclusion  that  query  expansion  led  to  the  greatest  improvement  in 
performance  when  the  original  query  gave  reasonable  retrieval 
results,  whereas  expansion  was  less  effective  when  the  original 
query  had  performed  badly.  This  is  in  accord  with  the  Association 
Hypothesis:  "If  an  index  term  is  good  at  discriminating  relevant 
from  non-relevant  documents  then  any  closely  associated  index  term 
is  also  likely  to  be  good  at  this"  [Van  Rijsbergen,  1979]. 

More  recent  work  on  Query  Expansion  related  to  Term  Co- 
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occurrence  has  been  based  on  probabilistic  models  of  the  retrieval 
process  and  has  tried  to  relax  some  of  the  strong  assumptions  of 
term  statistical  independence  that  normally  need  to  be  invoked  if 
probabilistic  retrieval  models  are  to  be  used.  Results  have  been 
mixed;  it  was  not  found  possible  to  obtain  consistent  improvements 
in  performance  by  the  use  of  any  of  the  query  expansion  methods 
[Croft,  1979;  Robertson,  1976;  Smeaton,  1983]  .  As  will  be  shown 
later  in  this  section,  this  has  not  been  the  experience  of  the 
authors,  and  may  be  due  to  the  sole  reliance  by  the  authors  on 
natural  language  expressions  from  the  database  for  Query  Expansion 
terms  rather  than  utilization  of  external  sources  for  these 
additional  terms. 

RELEVANCE  FEEDBACK  AND  QUERY  EXPANSION 

Relevance  Feedback  is  a  controlled  process  for  query 
reformulation.  The  main  idea  consists  of  choosing  important  terms, 
or  expressions,  attached  to  certain  previously  retrieved  documents 
that  have  been  identified  as  relevant  by  the  users,  and  of 
enhancing  the  importance  of  these  terms  in  a  new  query  formulation 
[Salton,  1990]  .  Analogously,  terms  included  in  previously 
retrieved  non-relevant  documents  could  be  deemphasized  in  any 
future  query  formulation. 

Initially,  the  Relevance  Feedback  implementations  were 
designed  for  queries  and  documents  in  vector  form;  i.e.,  query 
statements  consisting  of  sets  of  possibly  weighted  search  terms 
used  without  Boolean  operators  [Rocchio,  1971;  Salton,  1971]  . 

Since  the  1980s,  Relevance  Feedback  methods  have  been  applied  also 
to  Boolean  query  formulations,  where  the  process  incorporates  term 
conjuncts  (derived  from  previously  retrieved  relevant  documents) 
into  revised  query  formulations  [e.g.,  Salton,  1985] 

More  recently.  Relevance  Feedback  approaches  with 
probabilistic  information  retrieval  based  on  document  components 
have  been  incorporated  into  artificial  neural  networks  [e.g.,  Kwok, 
1995] .  This  approach  recognizes  the  intrinsic  relevance  feedback 
operation  of  artificial  neural  networks,  and  the  natural 
application  to  the  information  retrieval  process.  Performance  with 
feedback  improved  substantially  over  the  no  feedback  case. 

Another  important  recent  study,  which  has  an  important  bearing 
on  the  present  paper,  focused  on  determining  the  retrieval 
effectiveness  of  search  terms  identified  by  users  and 
intermediaries  from  retrieved  items  during  term  relevancS  feedback. 
Results  show  that  terms  selected  from  particular  database  fields  of 
retrieved  items  during  term  relevance  feedback  (TRF)  were  more 
effective  than  search  terms  from  the  intermediary,  database 
thesauri  or  users'  domain  knowledge  during  the  interaction.  The 
study  concludes  that  more  focus  on  the  practice  of  database 
searching  and  on  the  origins  of  the  terms  used  for  the  feedback 
process  is  necessary  [Spink,  1995]. 

Also  recently,  use  of  local  context  analysis  [Xu,  1996],  which 
combines  global  analysis  [Jing,  1994;  Callan,  1995]  and  local 
feedback  [Attar,  1977],  has  generated  effective  information 
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retrieval  results.  In  this  combined  approach,  noun  groups  are  used 
as  concepts  and  concepts  are  selected  based  on  co-occurrence  with 
query  terms.  Concepts  are  chosen  from  the  top-ranked  documents, 
similar  to  local  feedback,  but  the  best  passages  are  used  instead 
of  whole  documents.  An  algorithm  is  used  to  rank  the  concepts,  and 
the  query  is  then  expanded. 

The  overwhelming  majority  of  the  reported  Relevance  Feedback 
studies  for  Query  Expansion  appear  to  have  focused  on  the 
mathematical  operations  and  cognitive  aspects  of  the  feedback 
process.  While  this  focus  has  resulted  in  many  of  the  process 
innovations  and  advances,  it  has  been  limited  in  eliminating  the 
inconsistency  of  the  Relevance  Feedback  process  results. 

Relatively  few  innovative  approaches  have  been  applied  to 
identifying  more  appropriate  sources  of  expansion  terms.  The 
present  section  focuses  mainly  on  the  expansion  term  sources. 

In  the  remainder  of  this  section,  the  adaptation  and 
utilization  of  Database  Tomography  for  information  retrieval  (a 
process  called  Simulated  Nucleation)  is  presented.  Finally,  an 
application  of  the  procedure  to  generate  a  database  of  journal 
articles  on  the  science  and  technology  available  for  near-earth- 
space  missions  is  described  in  detail. 

UTILIZATION  OF  DATABASE  TOMOGRAPHY  FOR  INFORMATION  RETRIEVAL 

Database  Tomography  applied  to  information  retrieval  has  all 
the  desireable  qualities  of  the  high  quality  information  retrieval 
process  listed  in  the  Introduction,  and  more.  It  will  operate  on 
any  textual  database  in  any  language.  It  requires  text  only,  but 
will  be  enhanced  with  use  of  titles,  keywords,  references,  etc.  It 
works  directly  from  the  language  of  the  authors  of  the  database's 
contents,  and  improves  the  search  strategy  and  product  with  time. 
Its  value  increases  as  the  size  of  the  desired  retrieval  incre’ases. 

Simulated  Nucleation,  the  name  given  to  the  form  of  Database 
Tomography  adapted  to  information  retrieval,  derives  in  concept 
from  the  growth  of  materials.  A  core  nucleus  is  developed,  the 
properties  of  this  nucleus  are  identified,  and  then  similar 
material  is  added  onto  the  nucleus  as  time  develops  until  the 
desired  amount  of  material  is  obtained.  The  growth  process  is  then 
terminated. 

In  Simulated  Nucleation  for  information  retrieval,  the  purpose 
is  to  provide  a  tailored  database  of  retrieved  documents,  which 
contains  all  relevant  documents  from  the  larger  literature,  but 
which  also  contains  a  minimal  amount  of  non-relevant  documents.  In 
the  initial  step  of  Simulated  Nucleation,  a  small  core  group  of 
documents  relevant  to  the  topic  of  interest  is  identified.  An 
inherent  assumption  is  then  made  that  the  word  patterns  and 
combinations  in  this  core  group  would  be  found  to  occur  in  other 
relevant  documents,  and  therefore  these  word  patterns  and 
combinations  can  be  used  to  expand  the  search  query.  The  main 
algorithmic  components  of  Database  Tomography,  phrase  frequency  and 
phrase  proximity  analyses,  operate  on  this  core  group  of  documents. 

Patterns  of  word  combinations  in  existing  fields  are  identified. 
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new  search  term  combinations  which  follow  the  newly  identified 
patterns  are  generated,  and  the  process  is  repeated.  In  addition, 
patterns  of  word  combinations  which  reflect  extraneous  non-relevant 
material  which  may  have  been  introduced  are  identified,  and  search 
terms  which  have  the  ability  to  remove  non-relevant  documents  from 
the  database  are  inserted.  Thus,  Simulated  Nucleation  operates  in 
a  self-correcting  cybernetic  homeostatic  mode,  and  continually 
expands  the  coverage  and  improves  the  quality  of  the  core  database. 
This  iterative  procedure  continues  until  convergence  is  obtained, 
where  relatively  few  new  documents  or  non-relevant  documents  are 
found  even  though  new  search  terms  are  added. 

APPLICATION  OF  SIMULATED  NUCLEATION  FOR  INFORMATION  RETRIEVAL 

A)  Database  and  Field  Selection 

A  recent  application  of  Simulated  Nucleation  will  elucidate 
and  clarify  the  above  principles.  It  was  desired  to  generate  a 
compendium  of  journal  articles  on  Space  Science  and  Technology, 
with  emphasis  on  near-earth  space.  The  database  generated  would 
then  be  analyzed  using  the  standard  Database  Tomography  procedure 
[Kostoff,  1993f,  1994h,  1995e] .  As  a  source  for  these  articles, 
two  major  databases  were  employed:  the  Science  Citation  Index, 
which  includes  over  3200  research  (mainly  basic)  journals,  and  the 
Engineering  Compendex,  which  includes  over  2600  research  (mainly 
applied)  and  technology  and  engineering  journals.  The  remainder  of 
this  paper  describes  the  query  of  the  Science  Citation  Index,  since 
the  Engineering  Compendex  query  utilized  the  same  approach. 

For  the  present  study,  the  relevant  output  fields  for  a  given 
article  from  the  Science  Citation  Index  include:  Author(s) ,  title, 
journal,  author(s)  address(es),  author's  keywords,  keywords  plus, 
abstract,  references,  and  related  records.  While  most  of  the 
papers  have  most  of  the  fields,  not  all  papers  have  all  fields'.  In 
terms  of  frequency  of  fields  included,  all  papers  obtained  in  the 
Space  science  and  technology  search  seem  to  have  a  title  and 
journal  field,  most  have  an  author  and  address  field,  almost  as 
many  have  an  abstract  field,  roughly  a  similar  amount  have 
references  and  related  records  fields,  and  perhaps  half  have 
keyword  fields. 

B)  Initialization  of  Database 

The  search  was  initialized  on  the  Science  Citation  Index  using 
some  core  search  terms  which  would  generate  a  group  of  pdpers  which 
directly  addressed  the  topics  of  interest.  Based  on  previous 
experiments,  the  search  term-field  combination  which  had  the 
highest  probability  of  identifying  relevant  papers  was  selected 
initially.  This  combination  consisted  of  near-earth-space-related 
multi-word  phrases  applied  to  the  title  and  keyword  fields  (e.g., 
SATELLITE  IMAG*,  SATELLITE  SENS*,  SATELLITE  OBSERVATION*,  where  * 
denotes  the  multi-character  wildcard) .  Sixty  search  terms  were 
used  for  the  initialization,  and  this  produced  401  unique  journal 
articles,  over  95  percent  of  which  were  considered  as  very  relevant 
to  the  topical  area  of  interest.  Another  approach  was  also  tried 
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for  the  initialization.  The  purpose  was  to  reduce  the  number  of 
search  terms,  since  previous  searches  with  the  same  computer- 
database  combination  had  resulted  in  operational  problems  as  the 
number  of  search  terms  became  very  large.  The  term  SATELLITE  was 
used  in  the  title  and  keywords,  and  811  papers  were  identified,  of 
which  about  70  percent  were  near-earth-space  related.  About  20 
word  combinations  were  identified  which  reflected  non-space  related 
topics  (e.g.,  SATELLITE  VIRUS,  SATELLITE  AND  DNA,  SATELLITE 
CLINICS) ,  and  these  word  combinations  were  inserted  into  the  title 
and  keyword  search  queries  after  the  NOT  operator,  which  removed 
their  parent  documents  from  the  database.  The  resultant  657 
documents,  85  percent  of  which  were  space  related,  were  used  as  the 
initial  core  database. 

C)  Iterative  Query  Process 

At  this  point,  the  first  iterative  step  of  Simulated 
Nucleation  was  taken. 

1)  One  database  was  constructed  of  the  journal  article  titles, 
and  another  database  was  constructed  of  the  abstracts. 

2)  The  multi-word  phrase  frequency  analysis  component  of 
Database  Tomography  was  applied  to  each  database,  and  the  high 
frequency  single  word,  adjacent  double  word,  and  adjacent  triple 
word  phrases  in  each  database  were  extracted  and  ordered  by 
decreasing  frequency  (e.g.,  MICROGRAVITY,  ORBITAL  DEBRIS,  SYNTHETIC 
APERTURE  RADAR) . 

3)  The  multi-word  phrase  proximity  analysis  component  was 
applied  selectively  to  each  database.  In  particular,  a  few  high 
frequency  phrases  from  the  phrase  frequency  analysis  which  had 
multiple  meanings,  one  of  which  was  space-related,  were  used  as 
theme  words  about  which  a  word  frequency  dictionary  was  constructed 
(e.g.,  SPACE,  SATELLITE). 

4)  All  the  keywords  for  each  paper  were  combined,  and  the  list 
was  sorted  in  order  of  decreasing  frequency. 

5)  Based  on  parallel  interpretation  of  the  keywords,  phrase 
frequency  analysis  and  the  phrase  proximity  analysis,  different 
combinations  of  search  procedures-f ields  were  added  to  the  search 
query. 

5a)  Additional  space-related  multi-word  phrases  were 
added  to  the  keywords  and  title  fields  search  (e.g.,  ORBITAL 
DEBRIS,  EARTH  ORBIT). 

5b)  Boolean  combinations  of  phrases  were  added  to  the 
title  and  keyword  field  queries.  These  phrases  had  the  property 
that  individually  they  were  not  near-earth-space  unique,  but  in 
combination  they  appeared  much  more  relevant  topically  (e.g., 
SATELLITE  AND  IMAG*,  SPACE  AND  OCEAN).  This  expansion  into  a 
Boolean  search  has  positive  and  negative  virtues.  For  those 
Boolean  expressions  derived  from  a  multi-word  phrase  (e.g., 

SATELLITE  IMAG* - >SATELLITE  AND  IMAG*) ,  more  topical  papers  can 

be  retrieved  because  the  Boolean  query  terms  (SATELLITE  AND  IMAG*) 
are  less  restrictive  compared  to  the  parent  multi-word  phrase 
(SATELLITE  IMAG*) .  On  the  other  hand,  removing  the  restrictions  of 
close  multi-word  linkages  also  allows  the  possibility  that  non- 
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topical  papers  will  be  obtained. 

One  value  of  the  phrase  proximity  analysis  of  the  title 
field  is  that  some  estimate  of  the  value  of  expanding, a  multi-word 
phrase  query  (SATELLITE  SENSOR)  into  a  Boolean  c^ery  (SATELLITE  AND 
SENSOR)  can  be  extracted  from  the  phrase  proximity  results.  For 
example,  consider  the  following  case.  Assume  that  SATELLITE  is  the 
theme  of  a  phrase  proximity  analysis.  Assume  further  that 
SATELLITE  occurs  100  times  in  the  total  title  database.  In  the 
word  frequency  dictionary  constructed  around  the  theme  SATELLITE, 
assume  that  SENSOR  occurs  100  times  in  the  domain  within  fifty 
words  of  SATELLITE,  which  means  that  SENSOR  occurs  100  times  in  the 
same  title  as  SATELLITE.  This  means  that  the  Inclusion  index  based 
on  the  theme  word  Ij  (the  number  of  occurrences  of  SENSOR  within 
fifty  words  of  SATELLITE  divided  by  the  total  number  of  occurrences 
of  SATELLITE  in  the  database)  is  high.  If  Ij  is  low,  there  is 
little  motivation  to  use  the  word  combination  in  either  form. 

Assume  further  that  neither  SATELLITE  SENSOR  or  SATELLITE  AND 
SENSOR  were  used  in  previous  queries.  Examine  two  limiting 
conditions . 

In  the  first  condition,  the  multi-word  phrase  SATELLITE 
SENSOR  occurs  100  times  in  the  domain  within  fifty  words  of 
SATELLITE.  This  means  that  whenever  SATELLITE  and  SENSOR  appear 
together  in  the  title,  they  appear  as  the  multi-word  phrase 
SATELLITE  SENSOR.  In  this  situation,  the  pattern  SATELLITE  SENSOR 
is  the  one  that  has  always  occurred  in  the  title  of  the  relevant 
papers  obtained  so  far,  and  the  pattern  SATELLITE  AND  SENSOR  (where 
they  are  disconnected)  is  the  one  that  has  never  occurred  in  the 
title  of  the  relevant  papers.  Since  the  philosophy  of  Simulated 
Nucleation  is  to  extrapolate  from  known  patterns  within  relevant 
papers  to  obtain  new  search  terms,  then  the  pattern  SATELLITE 
SENSOR  would  be  used  as  a  new  search  term  for  the  title  and 
(probably)  keyword  fields  in  the  next  query  iteration. 

In  the  second  condition,  the  multi-word  phrase  SATELLITE 
SENSOR  occurs  zero  times  in  the  domain  within  fifty  words  of 
SATELLITE.  This  means  that  whenever  SATELLITE  and  SENSOR  appear 
together  in  the  title,  they  never  appear  as  the  multi-word  phrase 
SATELLITE  SENSOR.  This  example  covers  two  conditions:  where  a  real 
multi-word  phrase  (SATELLITE  SENSOR)  has  low  frequency  of 
occurrence,  or  where  the  two  search  terms  never  form  a  multi-word 

phrase  (SATELLITE - SEA  SURFACE  TEMPERATURE.  In  this  situation, 

the  pattern  SATELLITE  AND  SENSOR  is  the  one  that  has  always 
occurred  in  the  title  of  the  relevant  papers  obtained  so  far,  and 
the  pattern  SATELLITE  SENSOR  is  the  one  that  has  never  occurred  in 
the  title  of  the  relevant  papers.  Since  the  philosophy  of 
Simulated  Nucleation  is  to  extrapolate  from  known  patterns  within 
relevant  papers  to  obtain  new  search  terms,  then  the  pattern 
SATELLITE  AND  SENSOR  would  be  used  as  a  new  search  term  for  the 
title  and  (probably)  keyword  fields  in  the  next  query  iteration. 
This  query  pattern  identification  procedure  for  the  title  field  is 
extrapolateable  to  the  abstract  field  for  the  case  where  the 
spacing  between  the  query  search  terms  can  be  controlled  (term  a 
within  X  words  of  term  b) . 
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A  note  of  caution  is  required  here.  In  the  case  where 
SATELLITE  occurs  100  times  in  the  title,  SENSOR  occurs  100  times  in 
the  domain  within  fifty  words  of  SATELLITE,  and  SATELLITE  SENSOR 
occurs  100  times  in  the  domain  within  fifty  words  of  SATELLITE, 
then  the  correct  conclusion  to  be  drawn  is  that  for  space-related 
papers,  whenever  SATELLITE  and  SENSOR  occur  together  in  the  title, 
they  occur  as  a  word  pair.  However,  in  a  general  title  search  of 
all  SCI  papers,  SATELLITE  and  SENSOR  could  co-occur  theoretically 
in  the  title  many  times  in  non-space-related  papers. 

Another  value  of  the  phrase  proximity  analysis  of  the  title 
field  is  that  guidance  is  provided  into  expanding  the  search  into 
specific  areas  beyond  the  realm  of  near-earth-space.  For  example, 
assume  the  proximity  analysis  shows  that  a  close  relationship 
exists  between  SATELLITE  and  MICROWAVE  RADIOMETER.  This 
combination  could  be  added  to  the  title  and  keyword  search  as  a 
Boolean.  Then,  a  title  search  of  MICROWAVE  RADIOMETER  alone  could 
be  done  to  obtain  information  on  the  technology  beyond  its  space 
applications.  These  expansion  searches  could  be  performed  during 
the  database  analytic  phase.  This  benefit  of  phrase  proximity 
analysis  for  the  title  field  may  be  extrapolated  to  the  abstract 
field  in  the  case  where  the  spacing  between  the  search  term 
combination  can  be  controlled. 

5c)  Additional  space-related  multi-word  phrases  were 
added  to  the  abstract  field  search  (e.g.,  ORBITAL  DEBRIS,  EARTH 
ORBIT) . 

5d)  For  databases  where  the  spacing  between  the  search 
terms  could  not  be  controlled  in  the  abstract,  Boolean  search  terms 
should  not  be  added  to  the  abstract  field  to  broaden  the  query. 
Based  on  experiments  performed  on  the  title  and  abstract  fields 
using  word  pairs  (e.g.,  SATELLITE  SENSOR)  and  their  Boolean 
derivatives  (e.g.,  SATELLITE  AND  SENSOR),  it  was  found  that  while 
word  pair  queries  on  either  field  gave  good  results,  and  Boolean 
queries  on  the  title  and  keywords  gave  good  results,  Boolean 
queries  on  the  SCI  abstract  field  with  no  spacing  control  between 
the  terms  introduced  many  non-space-related  papers.  These  poor 
results  were  due  to  the  individual  terms  being  located  sufficiently 
distant  in  the  abstract  that  their  contexts  were  different. 

However,  for  databases  which  allow  control  of  the  spacing 
between  the  Boolean  search  terms  (e.g.,  SATELLITE  within  10  words 
of  SENSOR) ,  then  a  Boolean  query  could  be  used  for  the  abstract. 
Both  the  SCI  and  Engineering  Compendex  databases  allow  combinations 
of  terms  to  be  limited  to  the  same  sentence  in  the  abstract,  which 
places  them  in  the  same  context.  In  selecting  which  combinations 
of  terms  to  use  for  the  query,  priority  should  be  given  to  those 
terms  which  received  a  high  Inclusion  index  li  (ratio  of  a  term's 
frequency  of  appearance  within  x  words  of  the  theme  word  to  the 
term's  frequency  of  appearance  in  the  total  database)  from  the 
multi-word  proximity  analysis.  For  example,  assume  SATELLITE  is 
the  theme  word  of  a  proximity  analysis  cluster,  and  SENSOR  is  a 
member  within  the  cluster.  Assume  SATELLITE  occurs  100  times  in 
the  abstract  database,  and  SENSOR  occurs  100  times  in  the  abstract 
database.  Consider  two  cases. 
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In  the  first  case,  SENSOR  occurs  100  times  in  the  domain 
within  fifty  words  of  SATELLITE.  Its  Inclusion  index  would  be  one, 
and  every  time  SENSOR  appears  in  the  abstract  of  the  space-related 
papers  obtained  so  far,  it  appears  within  fifty  words  of  SATELLITE. 
It  therefore  would  be  recommended  that  SENSOR  AND  SATELLITE  be 
included  in  a  Boolean  search  of  the  abstract  where  the  spacing 
could  be  controlled. 

In  the  second  case,  SENSOR  occurs  zero  times  in  the  domain 
within  fifty  words  of  SATELLITE.  Its  Inclusion  index  would  be 
zero,  and  every  time  SENSOR  appears  in  the  abstract  of  the  space- 
related  papers  obtained  so  far,  it  never  appears  within  fifty  words 
of  SATELLITE.  It  therefore  would  be  recommended  that  SENSOR  AND 
SATELLITE  not  be  included  in  a  Boolean  search  of  the  abstract  where 
the  spacing  could  be  controlled,  since  there  is  no  evidence  that 
space-related  papers  would  be  obtained  by  the  query. 

For  the  Space-related  literature  search,  most  of  the  Boolean 
searches  in  the  abstract  field  involved  the  combination  of  three 
terms.  Two  of  the  terms  would  typically  have  high  values  of  the 
Inclusion  index  li,  would  typically  not  be  space-unique  even  in 
combination  (e.g.,  RADIOMETER  AND  SEA  SURFACE  TEMPERATURE),  and 
would  be  constrained  to  occur  in  the  same  sentence.  A  third  term, 
one  of  the  multiple-definition  space  terms  (e.g.,  SATELLITE,  SPACE, 
EARTH  ORBIT) ,  was  required  to  occur  somewhere  in  the  abstract,  but 
not  necessarily  the  same  sentence  as  the  other  two  words  (e.g., 
SATELLITE  AND  (RADIOMETER  SAME  SEA  SURFACE  TEMPERATURE) ,  where  SAME 
is  the  SCI  operator  which  constrains  the  two  terms  to  the  same 
sentence  in  the  abstract.).  Experiments  were  run,  and  it  was 
discovered  that  this  approach  captured  all  the  space-related  papers 
and  eliminated  all  the  non-space  related  papers. 

5e)  Steps  5a)  to  5d)  reflect  use  of  phrase  frequency  and 
proximity  results  to  add  new  search  terms  to  the  query.  However, 
these  results  were  also  used  in  the  same  manner  to  identify  phrases 
which  reflected  non-space  oriented  papers.  These  phrases  (e.g., 
SATELLITE  TOBACCO,  SATELLITE  RNA*)  were  inserted  into  the  query  in 
various  combinations  in  conjunction  with  the  Boolean  NOT  operator, 
and  their  parent  documents  were  removed  from  the  database. 

The  iterative  process  described  above  was  repeated  until 
convergence  was  obtained.  In  particular,  the  relationship  between 
the  number  of  iterations  and  the  cumulative  number  of  space-related 
papers  was  tracked,  and  convergence  was  defined  to  occur  when  the 
slope  of  this  curve  became  small  (addition  of  a  new  iteration 
resulted  in  very  few  new  papers) . 

D)  Additional  Querying  Capabilities  not  Utilized 

Before  presenting  results  and  conclusions,  some  discussion  is 
provided  about  capabilities  of  Database  Tomography  which  were  not 
utilized  in  the  present  study,  and  other  capabilities  that  were  not 
used  as  well. 

1)  Database  Tomography  was  applied  only  to  the  database  of 
space-related  articles.  It  was  not  applied  to  the  total  SCI  or 
Engineering  Compendex  database  for  a  number  of  logistic  reasons. 
Application  of  Database  Tomography  to  the  total  SCI  abstracts,  for 
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example,  could  help  increase  the  probability  of  selecting  space- 
related  papers  from  an  abstract  search,  even  without  the  capability 
to  control  the  spacing  between  the  terms. 

Consider  the  following  case.  SATELLITE  occurs  lOOO  times  in 
the  total  abstract  database.  SENSOR  occurs  1000  times  in  the  total 
abstract  database.  Perform  a  proximity  analysis  about  the  theme 
SATELLITE,  parametrically  varying  the  spacing  of  the  domain.  If 
the  Inclusion  index  li  of  SENSOR  is  very  high  for  the  highest 
domain  value  (say  within  100  words  of  SATELLITE) ,  and  if  the 
Inclusion  index  li  value  remains  essentially  constant  as  the  domain 
size  decreases,  then  it  could  be  concluded  that  whenever  SENSOR 
occurs  in  the  abstract,  it  tends  to  occur  close  to  SATELLITE.  This 
means  the  two  terms  are  probably  related  contextually,  and  the 
query  SATELLITE  AND  SENSOR  applied  to  the  abstract  would  have  a 
high  probability  of  identifying  space-related  papers.  In  the  case 
where  the  capability  to  control  the  spacing  of  search  terms  exists, 
then  the  parametric  variation  described  above  could  be  used  to 
specify  the  spacing  desired  for  maximum  signal-to-noise. 

2)  Not  all  fields  in  the  SCI  or  Engineering  Compendex  were 
used  in  the  iterative  procedure.  For  example,  there  is  an  SCI 
field  titled  Related  Records.  It  refers  to  other  journal  articles 
which  share  some  of  the  references  in  the  subject  article.  This 
field  could  have  been  used  to  expand  the  search  in  the  iterative 
procedure  through  the  co-citation  relationships.  The  authors 
believe,  however,  that  the  co-citation  relationships  are  not  as 
direct  as  the  co-word  relationships  because  intellectual  linkages 
are  only  one  of  many  reasons  for  citing  references  (Kostoff,  1997a; 
MacRoberts,  1996) . 

3)  More  complex  search  terms  were  not  used.  For  example,  the 
search  tree  approach,  where  terras  are  weighted  by  perceived 
importance  and  then  summed  to  provide  a  figure  of  merit,  was  not 
used.  It  might  have  some  real  value  in  the  last  iteration  (when 
the  terms  identified  by  the  database  language  have  been  obtained) 
for  focusing  the  database  output. 

4)  The  above  unused  capabilities,  and  many  more,  are 
compatible  with  Simulated  Nucleation,  and  could  be  employed  if 
desired.  For  example,  if  an  analyst  feels  that  the  co-citation 
iterative  approach  would  provide  value  to  the  search  over  and  above 
the  co-word  approach,  then  it  need  merely  to  be  incorporated  into 
the  search  process  with  no  disruptive  effects. 

RESULTS  AND  CONCLUSIONS 

In  the  initial  SCI  extracted  database,  there  were  657  records, 
of  which  about  15%  were  judged  by  detailed  reading  of  the  records' 
abstracts  to  be  not  related  to  the  topic  of  interest.  Word 
frequency  and  word  proximity  analyses  were  performed  on  this 
initial  database,  and  the  results  were  used  to  both  expand  and 
refine  the  search  query.  The  main  focus  was  on  expanding  the 
search  query  using  double  and  triple  word  phrases.  This  first 
iterative  step  resulted  in  a  much  larger  extracted  database  of  1476 
records,  of  which  about  10.5  percent  were  judged  to  be  non-related 


307 


to  the  topic. 

For  the  next  iterative  step,  more  emphasis  was  placed  on 
expanding  the  search  query  using  non-contiguous  words  which  co¬ 
occurred  in  the  same  sentence,  as  well  as  removing  noh-applicable 
records.  There  were  relatively  few  new  word  pairs  or  triplets  to 
add  to  the  query,  since  the  high  and  mid-frequency  word 
coinbinations  remained  unchanged.  This  broader  expansion  of  the 
search  query  using  non-contiguous  words  added  relatively  more  non- 
related  records  than  the  contiguous  word  expansion  of  the  previous 
iteration,  which  was  balanced  by  the  removal  of  non-applicable 
records  using  the  word  frequency  and  proximity  results.  This 
second  iterative  step  resulted  in  a  moderately  larger  extracted 
database  of  1726  records,  of  which  about  ten  percent  were  judged 
non-applicable. 

The  final  iterative  step  focused  on  removing  the  non- 
applicable  records  by  adding  much  lower  frequency  terms  from  the 
phrase  frequency  and  proximity  results  to  the  search  query,  since 
the  word  frequency  and  proximity  analyses  did  not  yield  any  new 
high  or  even  moderate  frequency  contiguous  or  non-contiguous  word 
combinations..  This  final  step  resulted  in  1642  records,  of  which 
about  5.5  percent  were  judged  to  be  non-applicable.  Figure  4  shows 
the  trend  of  non-applicable  record  percentages  with  number  of 
iterations,  and  Figure  5  shows  the  trend  of  applicable  records  with 
number  of  iterations.  These  figures  provide  graphical  evidence 
that  an  efficiently  tailored  database  has  been  obtained. 

Adding  terms  to  the  search  query  to  expand  the  number  of  valid 
records  is  a  straightforward  process,  whereas  adding  terms  to  the 
search  query  to  remove  unwanted  records  proved  to  be  less 
efficient.  The  approach  used  to  initialize  the  database, 
retrieving  all  records  with  SATELLITE*  in  the  title,  then 
subtracting  those  records  with  obviously  non-relevant  contiguous 
word  combinations  in  the  title  (SATELLITE  CLINICS,  SATELLITE  DNA) , 
introduced  a  substantial  number  of  non-applicable  records  in  the 
extracted  database.  To  finally  remove  a  reasonable  fraction  of  the 
non-applicable  records,  it  was  necessary  to  examine  the  moderate  to 
low  frequency  terms  from  the  word  frequency  and  proximity  results, 
and  add  these  terms  to  the  query.  There  proved  to  be  a  practical 
limitation  to  this  process.  In  the  proximity  or  frequency  outputs, 
typically  more  than  half  of  the  phrases  have  frequencies  of  three 
or  less,  and  three  is  typically  used  as  an  output  cutoff  to  keep 
the  outputs  manageable.  Many  of  the  non-applicable  records  tend  to 
have  very  low  frequency  patterns,  and  therefore  could  not  be 
identified  from  the  frequency  analyses  in  an  efficient  manner. 

Adding  non-contiguous  terms  to  the  query  resulted  in  both  more 
applicable  papers  than  would  have  been  obtained  with  contiguous 
terms  and  more  non-applicable  papers  as  well.  The  removal  of  non- 
applicable  papers  encountered  the  same  inherent  difficulties  as 
described  in  the  previous  paragraph.  The  major  sources  of  non- 
applicable  papers  from  the  non-contiguous  terms  were  from  words 
like  SPACE,  which  had  many  meanings  (e.g.,  TIME  and  SPACE,  HALF¬ 
SPACE,  SPACE  SYMMETRY)  when  not  employed  as  part  of  a  word  pair  or 
triplet  in  addition  to  the  topic  of  interest.  Adding  contiguous 
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terms  to  the  query  resulted  in  the  least  addition  of  non-applicable 
records.  The  first  approach  to  initializing  the  database  of 
extracted  records  demonstrated  the  efficiency  of  contiguous  term 
addition,  where  only  a  very  few  percent  of  400  records  were  non- 
applicable. 

The  three  iterative  steps  required  for  the  present  search 
process  is  typical  of  past  experience  with  DT  for  information 
retrieval.  For  detailed  research-oriented  databases  such  as  the 
SCI,  which  have  very  unique  vocabularies  and  multiple  meanings  for 
the  same  word  (as  in  the  case  of  SATELLITE  in  this  paper) ,  perhaps 
another  iterative  step  may  be  required.  For  technology-oriented 
databases  such  as  the  EC,  which  tend  to  have  less  vocabulary 
diversity  and  less  esoteric  vocabulary,  less  iterative  steps  are 
necessary.  In  the  present  study,  for  example,  the  word  SATELLITE 
in  the  EC  had  the  commonly  used  meaning  of  a  man-made  object 
orbiting  the  earth,  and  essentially  every  record  retrieved  was 
applicable  to  the  topic  of  interest.  For  less  technical  databases 
(such  as  programmatic  narrative  databases) ,  which  have  more  generic 
vocabularies,  sometimes  two  iterative  steps  may  be  all  that  are 
necessary.  More  iterative  steps  would  be  necessary  if  the  analyst 
wishes  to  either  add  or  remove  records  reflective  of  low  frequency 
phrase  occurrences.  Thus,  for  the  mid  to  high  frequency  themes  of 
prime  interest,  the  iteration  process  converges  quite  rapidly,  but 
because  of  the  continuing  growth  feature  of  Simulated  Nucleation, 
there  will  always  be  a  few  records  added  with  corresponding 
addition  of  low  frequency  phrase  terms  as  well. 

In  summary,  DT  is  a  powerful  tool  for  retrieving  records  from 
familiar  or  unfamiliar  databases  with  a  high  degree  of  accuracy, 
and  is  sufficiently  flexible  to  be  combined  with  other  information 
retrieval  techniques  such  as  search  tree  or  co-citation  approaches. 
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responses  to  the  requirements  of  the  Government  Performance  and 
Results  Act  of  1993  (GPRA) . 

PERFORMANCE  MEASURES  FOR  GOVERNMENT-SPONSORED  RESEARCH,  Special  Issue 
of  the  Journal  SCIENTOMETRICS.  Kostoff,  R.  N.  (ed.),  Vol .  36,  No.  3, 
J  uly-Augus t  19  96. 

TABLE  OF  CONTENTS: 

*PERFORMANCE  MEASURES  FOR  GOVERNMENT-SPONSORED  RESEARCH:  OVERVIEW  AND 
BACKGROUND 

Ronald  N.  Kostoff  (Office  of  Naval  Research) 

*BIBLIOMETRIC  PERFORMANCE  MEASURES 

Francis  Narin  and  Kimberly  S.  Hamilton  (CHI  Research,  Inc.) 

*CROSS-FIELD  NORMALIZATION  OF  SCIENTOMETRIC  INDICATORS 

Andreas  Schubert  and  Tibor  Braun  (Library  of  the  Hungarian 
Academy  of  Sciences) 

*ECONOMIC  PERFORMANCE  MEASURES  FOR  EVALUATING  GOVERNMENT-SPONSORED 
RESEARCH 

Albert  N.  Link  (University  of  North  Carolina  at  Greensboro) 

*THE  USE  OF  MULTIPLE  INDICATORS  IN  THE  ASSESSMENT  OF  BASIC  RESEARCH 
Ben  R.  Martin  (University  of  Sussex) 

*STUDYING  RESEARCH  COLLABORATION  USING  CO-AUTHORSHIPS 
Goran  Melin  and  Olle  Persson  (Umea  University) 

*INTEGRATED  FIGURE  OF  MERIT  OF  PUBLIC  SECTOR  RESEARCH  EVALUATION 
Eli  Geisler  (Northwestern  University) 

*ADVANCED  BIBLIOMETRIC  METHODS  AS  QUANTITATIVE  CORE  OF  PEER  REVIEW 
BASED  EVALUATION  AND  FORESIGHT  EXERCISES 

Anthony  F.  J.  vanRaan  (University  of  Leiden) 

*BIBLIOMETRIC  INDICATORS  AND  THE  COMPETITIVE  ENVIRONMENT  OF  R&D 
LABORATORIES 

Roger  Miller  and  Andre  Manseau  (Universite  du  Quebec  a  Montreal) 
*PROBLEMS  OF  CITATION  ANALYSIS 

Michael  H.  MacRoberts  and  Barbara  R.  MacRoberts  (Bog  Research) 


336 


*CONFORMING  THE  GOVERNMENT  R&D  FUNCTION  WITH  THE  REQUIREMENTS  OF  THE 
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X-C.  The  following  special  issue  of  the  Journal  of  Technology 
Transfer  focuses  on  the  problem  of  converting  science  to  technology. 
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ARUNACHALAM-S-1985-SCIENTOMETRICS-V8-P301  8 
ZIMAN-JM-1968-PnBLIC-KNOWLEDGE-SOC  7 
WILSON- JD- 197 8 - J-CLIN- INVEST-V6 1-Pl 6  9 7  7 

WHITI.EY-R-1984-INTELLECTUAL-SOCIAL  7 
VINKI.ER-P-1988-SCIENTOMETRICS-V14-P453  7 
STINSON-ER-1987-J-INFORM-SCI-V13-P65  7 
SCOTT-WA-1974-AM-PSYCHOL-V29-P698  7 
RDSHTON-P-1984-B-BRIT-PSYCHOL-SOC-V37-P33  7 
RUBENSTEIN-LV- 1990 - JAMA- J-AM-MED-ASSOC-V2  64-P1974 
RELM2LN- AS- 1 989 -NEW-ENGL- J-MED-V3  21-P827  7 

RAVNSKOV-U-1992-BMJ-V305-P15  7 
PRICE-DD-1970-COMMDNICATION-SCI-EN-P3  7 
PRICE-DD-1986-LITTLE-SCI-BIG-SCI  7 
PETERS-HPF-1993-RES-POLICY-V22-P47  7 
PERITZ-BC-1981-LIBR-RES-V3-P47  7 
PERITZ-BC-1983-SCIENTOMETRICS-V5-P211  7 
OXMAN-AD-1991-J-CLIN-EPIDEMIOL-V44-P91  7 
NOUR-MM-1985-LIBR-INFORM-SCI-RES-V7-P261  7 
MOED-HF-19  8  9-SCIENTOMETRICS-V1S-P47  3  7 

MCCAIN-KW-1990-J-AM-SOC-INFORM-SCI-V41-P433  7 
MCCAIN-KW-1989-J-AM-SOC-INFORM-SCI-V40-P110  7 
I,ONG-JS-1978-AM-SOCIOL-REV-V43-P889  7 
LINE-MB-1974-J-DOC-V30-P283  7 

KOSTOFF-RN-1993-ASSESSING-R-D- IMPACT  7 
KEMDALL-MG- 19  6 0 -OPERATIONAL-RESEARCH- V11-P3 1  7 

JOHNES-G-1988-HIGHER-ED-Q-V42-P54  7 
INGELFINGER-F J- 19 7  4 -AM- J-MED-V5  6-P6 8  6  7 

HAYNES-RB-1990-ANN-INTERN-MED-V112-P78  7 
GUBA-EG-1989-4TH-GENERATION-EVALU  7 
GRILICHES- Z- 19  9  0 - J-ECON-LIT-V2  8-Pl 6  6 1  7 

GRIFFITH-BC-1974-SCI-STUD-V4-P339  7 
GOTTFREDSON-SD-1978-AM-PSYCHOL-V33-P920  7 
GOFFMAN-W-1969-NATURE-V221-P1205  7 

GimFDNKEL- JM- 19  9  0 -  JAMA- J-AM-MED- ASSOC-V2  63-P1369  7 

FR2aiE-JD-1977-SOC-STUD-SCI-V7-P501  7 
EVANS- JT- 1990 -  JAMA- J-M-MED-ASSOC-V2  63-P1353  7 

DIAMOND- AM- 198 6- J-HUM-RESOUR-V2 1-P2  00  7 

DELACE Y-G- 19  8 5-BRIT-MED- J-V2  9 1-P8 8 4  7 

CRANE-D-1967-AM-SOCIOL-V2-P195  7 
COZZENS-SE-1989-SCIENTOMETRICS-V15-P437  7 
COLE-J-1971-AM-SOCIOL-V6-P23  7 
COHEN- J-19  6  0-EDDC-PSYCHOL-MEaB-V2  0-P3  7  7 

CHUBIN-DE-1975-SOC-STUD-SCI-V5-P423  7 
CAVE-M-1988-USE-PERFORMANCE-INDI  7 
CARPENTER-MP-1988-SCIENTOMETRICS-V14-P213  7 
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CANO-V-1989-J-AM-SOC-INFORM-SCI-V40-P284  7 
BROOKES-BC-1977-J-DOC-V33-P180  7 
BRAmi-T-1987-SCIENTOMETRICS-Vll-P127  7 
BRAUN-T-1989-SCIENTOMETRICS-V15-P165  7 
BRAUN-T-1987-SCIENTOMETRICS-V12-P3  7 
BRA2MI-RR-1991-J-AM-SOC-INFORM-SCI-V42-P233  7 
ARUNACH2LLaM-S-1985-J-INPORM-SCI~V10-P165  7 
ZUCKERMAN-H-1967-AM-SOCIOL-REV-V32-P391  6 
WINDSOR-DA- 19 7  3 - J-AM-SOC- INFORM-SCI-V2  4-P3 7  7  6 

WHITTAKER-J-1989-SOC-STUD-SCI-V19-P473  6 
WEING2kRT-P-1984-VERMESSnNG-FORSCHnN6  6 
WATSON-PD-1985-COLL-RES-LIBR-V46-P334  6 
WA1INER-RA-1981-SOCIOL-EDUC-V54-P238  6 
WADE-N-1975-SCIENCE-V188-P429  6 

VIACHY-J-1985-SCIENTOMETRICS-V7-P505  6 
VELHO-L-1986-SCIENTOMETRICS-V9-P71  6 
VANRAAN-l^ J- 19  9  3 -SCIENTOMETRICS-V2  6-Pl 6  9  6 

TIJSEEN-RJW- 19  87 -SCIENTOMETRICS-V11-P3  51  6 

THYER-BA-1986-J-SOC-WORK-EDnC-V22-P67  6 
SWEETIAMD- JH- 198  9 -LIBR-QUMT-V5  9  -P2  9 1  6 

SUBRAMA1IYAM-K-1983-J-INF0RM-SCI-V6“P33  6 
SPAGNOLO-P-1990-SCIENTOMETRICS~V18-P205  6 
SM2U^L-H-1980-SCIENTOMETRICS-V2-P277  6 
SIMON-HA-1955-BIOMETRIKA-V42-P425  6 
SIEGEL-S-1956-NONPAR2y(ETRIC-STATIS  6 
SEGLEN-PO- 19  9  2 - J-AM-SOC-INFORM-SCI~V4  3 -P62  8  6 

SCHUBERT-A-1983-SCIENTOMETRICS-V5-P59  6 
SCHUBERT-A-1988-HDB-QUANTITATIVE-STU-P137  6 
SCHMCX)KLER-J-1966-INVENTION-EC-GROWTH  6 
SARACEVIC-T-1988-J-AM-SOC-INFORM-SCI-V39-P197  6 
ROCHE-M-1982-INTERCIENCIA-V7-P279  6 
PRATT-AD-1977-J-AM-SOC-INFORM-SCI-V28~P285  6 
PORTER-AL-1977-SOC-STUD-SCI-V7-P257  6 
PINSKI-G- 197  6~INFORMATION-PROCESSI-V12-P2  97  6 

PERSS0N-0-1988-HDB-QUANTITATIVE-STU-P229  6 
PERITZ-BC-1983-SCIENTOMETRICS-V5-P303  6 
PERITZ-BC- 19  9 2 - J-AM-SOC-INFORM-SCI-V4  3 -P4  4 8  6 

PERITZ-BC-1980-LIBR-RES-V2-P251  6 
PAVITT-K-1985-SCIENTOMETRICS-V7-P77  6 
PAO-ML-1985-INPORM-PROCESS-MANAG-V2 1-P3  05  6 

NOMA-E-1982-SCIENTOMETRICS-V4-P297  6 
NARIN-F-1985-SCIENTOMETRICS-V7-P369  6 
NARIN-F- 199 1-SCIENTOMETRICS-V2 1-P3 13  6 

N2^IN-F-1989-EVALUATION-SCI-RES  6 
MOED-HF-1989-J-INFORM-SCI-V15-P95  6 
MELLON-CA--1990-HATURALISTIC-INQDIRY  6 
MEADOWS-AJ-1974-COMMDNICATION-SCI  6 
MAY-KO-1967-SCIENCE-V156-P890  6 

M2^TIN-BR-1991-SCIENTOMETRICS-V20-P333  6 
MARSHAKOVA-IV-1988-SISTEMA-TSITIROVANIY  6 
HJUKKONEN-T-1992-SCI-TECHNOL-V17-P101  6 
LONG-JS-1980-SOC-STUD-SCI-V10-P127  6 
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LOMNITZ-LA-1987-SOC-STDD-SCI-V17-P115  6 
LINE-MB-1978-COLLECTION-MAMAGEMEN-V2-P313  6 
LIEBOWITZ-SJ-1988-Q-REV-ECON-BnS-V28-P88  6 
LEYDESDORFF-L--1988-SCI-PUBL-POLICY-V15-P149  6 
I.EYDESDORFF-L-1990-SCIENTOMETRICS-V19-P297  6 
LAW-J-1988-SCIENTOMETRICS-V14-P251  6 
KRONICK-DA-1990-JAMA-J-AM-MED-ASSOC-V2  63-P132 1  6 

KOSTOFF-RN-1992-3RD-INT-C-MAN-TECHN  6 
KEY-JD-1977-ARCH-PHYSICAL“MED-RE-V58-P136  6 
KEEN-PGW-1980-1ST-P-INT-C-INF-SYST-P9  6 
JAYARATNE-S-1979-J-ED-SOCIAL-WORK-V15-P72  6 
HEINZKILL-R-1980-LIBRARY-Q-V50-P352  6 
HAYES-RM-1983-J-EDUC-LIBR-INF-SCI-V23-P151  6 
HARTER-SP- 1992 - J-AM-SOC-INFORM-SCI-V4 3 -P6  0 2  6 

HAITDN-SD-1982-SCIENTOMETRICS-V4-P5  6 
GRILICHES-Z-1979-BELL-J-ECON-V10-P92  6 
GOTTLIEB-MS-1981-NEW-ENGL-J-MED-V305-P1425  6 
GOLDMAN-RL-19  9  2- JAMA- J-AM-MED-ASSOC-V2  67  -P9 58  6 

GARLAND-K-1987-J-EDUC-LIBR-INF-SCI-V28-P87  6 
GARFIELD-E-1980-LIBRARY-Q-V50-P40  6 
GARFIELD-E-1978-METRIC-SCI-ADVENT-SC-P179  6 
GARPIELD-E-1976-HATURE-V264-P609  6 
GARFIELD-E-1982-CURRENT-CONTENT-0301-P5  6 
GARFIELD-E-1981-CURRENT-COMTENT-1012-P5  6 
GARFIELD-E-1990-Aini-AM-ACAD-POLIT-SS-V511-P10  6 
FROST-CO-1979-LIBRARY-Q-V49-P399  6 
FOLLY-G-1981-SCIENTOMETRICS-V3-P135  6 
EGGHE-L-1987-J-AM-SOC-INPORM-SCI-V38-P288  6 
EDWARDS-GW- 19 8  4 -AM- J-AGR-EC0N-V6  6-P4 1  6 

CULNAN-MJ-1986-MIS-QUART-V10-P289  6 
COOK-TD- 19 7 9 -QUASIEXPERIMENTATION  6 
COLE-S-1978-PEER-REV-NATIONAL-SC  6 
COLE-S-1983-AM-J-SOCIOL-V89-P111  6 
COLE-JR-1972-SCIENCE-V178-P368  6 
CHEN-YS-1986-J-AM-SOC-INFORM-SCI-V37-P307  6 
CARPENTER-MP- 1973 - J-AM- SOC- INFORM- SCI- V2  4 - P4  2  5  6 

CARPENTER-MP-1981-J-AM-SOC-INFORM-SCI-V32-P430  6 
CARPENTER-MP- 1983 -WORLD- PATENT- INFORMA- V5 -P18  0  6 

BROOKS-CH-1991-MED-C7^-V29-P755  6 

BROOKES-BC-1970-J-DOC-V26-P283  6 
BROADDS-RN-1971-INT-SOC-SCI-J-V23-P236  6 
BRAUN-T-1991-SCIENTOMETRICS-V20-P359  6 
BEYER-JM-1978-SOCIOI.OGICAL-Q-V19-P68  6 
BASBERG-BL-1987-RES-POLICY-V16-P131  6 
BANDDRA-A-1986-SOCIAL-F-THODGHT-ACT  6 
BAKER-DR-1990-SOC-WORK-FES-ABSTR-V26-P3  6 
ARDNACH2y:AM-S-1986-J-INFORM-SCI-V12-P105  6 
AM-PSYCH-ASS-1987-DIAGN-STAT-MAN-MENT  6 
ALLISON- PD-1990-AM-SOCIOL-REV-V55-P469  6 
ALBERT-MB-1991-RES-POLICY-V20-P251  6 
AJIFERDKE-I-1988-SCIENTOMETRICS-V14-P421  6 
<ANON>- 19  9  0- JAMA- J-AM-MED-ASSOC-V2  63-P13 17  6 
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ZUCKERMMI-H-19  68-AM- J-SOCIOL-V74~P27  6  5 

YDTHAVONG-Y-1986-SCIE1ITOMETRICS-V9-P139  5 
XHIGNESSE-LV-1967-AM-PSYCH0L-V22-P778  5 
WILSON-EO-1975-SOCIOBIOLOGY  5 
WHITE-MJ-1977-AM-PSYCHOL-V32-P301  5 
WHITE-HD- 1989 -ANN-REV-INFORMATION-V2  4--P1 19  5 

WHITE-HD-1981-J~AM-SOC-INFORM-SCI-V32-P163  5 
WEINBERG-AM-1963-MINERVA-V1-P159  5 
VOGEL-DR-1984-DATA-BASE-V16-P3  5 
VLACHY-J-1978-SCIENTOMETRICS-V1-P109  5 
VIRGO-JA-1977-LIBR-Q-V47-P415  5 

VINKI.ER-P-1991-SCIENTOMETRICS-V20-P145  5 
VINKLER-P-1987-SCIENTOMETRICS-V12-P47  5 
VELHO-L-1984-SOC-STUD-SCI-V14-P45  5 
VANVIANEN-BG-1990-RES-POLICY-V19-P61  5 
V2ijlRAAN-AFJ-1990-NATURE-V347~P626  5 
VA1IRAAN-AFJ-1989-SCIENTOMETRICS-V15-P607  5 
TDR11ER-WA-1988-HDB-QUANTITATIVE-STD-P291  5 
TOMER-C-1986-I1WORM-PROCESS-MANAG-V22-P251  5 
TADBES--G-1993-SCIENCE-V260-P884  5 
SQUIRES-BP- 19  8  9 -CAN-MED-ASSOC- J-V14 1-P19  5  5 

SOPER-ME-1976-LIBRARY-Q-V46-P397  5 
SMALL-H-1985-SCIE1ITOMETRICS-V7-P393  5 
SINGI.ETON-A-1976-J-DOC-V32-P258  5 
SIEGELMAN-SS-1991-RADIOLOGY-V178-P637  5 
SCHAMBER-L-1990-INFORM->PROCESS-MAMAG-V26-P755  5 
SCARR-S-1978-AM-PSYCHOL-V33-P935  5 
SARACEVIC-T-1988-J-AM-SOC-INPORM-SCI-V39-P161  5 
RUSHTON- JP- 19  8  9 -PS YCHOL-B-BRIT-PS YCH-V2 -P6 4  5 

ROBIN-ED-1987-CHEST-V91-P252  5 
RIP-A-1988-HDB-QUANTITATIVE-STU-P253  5 
RICE-RE-1990-SCHOLARLY-COMMUNICAT-P138  5 
RESKIN-BF- 19 7  9 -SOCIOL-EDnC-V52 -P12  9  5 

RABKIN-YM-1979-SCIENTOMETRICS-V1-P261  5 
PRICE-DJD- 19 7  6 - J- AM- SOC- INFORM-SCI-V2 7 - P2  9 2  5 

PORTER-AL-1985-SCIE1ITOMETRICS-V8-P161  5 
PETERS-B:PF-1988-INF0RMETRICS-87-88-P175  5 
PENAVA-Z-1989-J-INFORM-SCI-V15-P71  5 
PELZ-DC- 19  6  6-SCI-ORG-PRODUCTIVE-C  5 
P2mDECK-JT-1992-RES-SOCIAL-WORK-PRAC-V2-P487  5 
NORTON-GW- 198 1-AM- J-AGR-ECON-V63-P68  5  5 

NOMA-E-1986-SUBJECT-CLASSIFICATI  5 
NEDERHOF-AJ-1988-HDB-QUANTITATIVE-STU-P193  5 
NARIN-F-1990-MEASUREMENT-SCI-COOP-V1  5 
NARIN-P-1992-RES-POLICY-V21-P237  5 
MOTYLEV-VM-1981-Iirr-PORDM-INFORMATIO-V6-P3  5 
MORAVCSIK-MJ-1979-SCIE1ITOMETRICS-V1-P161  5 
MOED-HF-1988-INFORMETRICS-87-88-P133  5 
MERTON-RK-1938-OSIRIS-STUDIES-HIST-P362  5 
MELTZER-BN-1949-AM-J-SOCIOL-V55-P25  5 
MCREYNOLDS-P-1971-AM-PSYCHOL-V26-P400  5 
MCLELLAN-MF-1992-A11ESTHESIOLOGY-V77-P185  5 
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MCKIBBON-KA-1990-COMPUT-BIOMED-RES-V23-P583 
MCCAIN-KW-1989-SCIENTOMETRICS-V17-P127  5 
MATSON-JL-1989-AM-PSYCHOL-V44-P737  5 
1IARTIN-BR-1987-NATDRE-V330-P123  5 
liaRGOLIS-J-1967-SCIENCE-V15S-P1213  5 
MAHOME Y-MJ- 1987 - J-SOC-BEHAV-PERS-V2 - PI 6 5  5 

MACROBERTS-MH- 19  87 -SCIEMTOMETRICS-V12 -P2  93  5 

LODAHL- JB- 19 7 2 -AM-SOCIOL-REV-V37-P57  5 
LOCK-S-1990-JAMA-J--AM-MED-ASSOC-V263-P1341  5 
LOCK-S-1986-DIFFICULT-BALANCE-ED  5 
LIHE-MB--1970-J-DOC-V26-P46  5 

LINE-MB-1979-J-DOC-V35-P265  5 

LINDSE  Y-D- 19  9 2  --SOC-SERV-REV-V6  6-P2  9  5  5 

LINCOLN- YS-1985-NATDRALISTIC- INQUIRY  5 
LEYDESDORFF-L- 19  9 1-SOC-NETWORKS-V13-P3  01  5 

LEYDESDORFF-L-19  9 1-SCIENTOMETRICS-V2  0-P3  63  5 

LENK-P-1983-J-AM-SOC-INFORM-SCI-V34-P115  5 
LAWANI-SM-1983-J-AM-SOC-INFORM-SCI-V34-P59  5 
LATOUR-B-1987-SCI-ACTION  5 
LATOUR-B-1987-SCI-ACTION-FOLLOW-SC  5 
LANCASTER-FW-1986-SCIENTOMETRICS-V10-P243  5 
LAB2^-DN-1985-SOUTHERN-ECON-J-V52-P216  5 
KOSTOFF-RN-1991-22ND-P-2^-PITTSB-C  5 
KOSTOFF-RN-1988-IEEE-T-ENG-MANAGEMEN-V35  5 
KING-J-1987-J-INFORMATION-SCI-V13  5 
IRVINE-J-1985-SOC-STUD-SCI-V15-P293  5 
IRVINE-J-1983-SOC-STnD-SCI-V13-P49  5 
HICKS-D-1986-R-D-MANAGE-V16-P211  5 
HELMREICH-RL-1980-J-PERS-SOC-PSYCHOL-V39-P896  5 
HE-CP-1986-INFORM-PROCESS-MANAG-V22-P405  5 
HARNAD-S-1982-PEER-COMMENTARY-PEER  5 
HAITDN-SD-1982-SCIENTOMETRICS-V4-P89  5 
GRILICHES-Z-1957-ECONOMETRICA-V25-P501  5 
GRAVES-PE-1982-AM-ECON-REV-V72-P1131  5 
GILLETT-R- 19 8 9 -HIGH-EDUC-Q-V4  3 -P2  0  5 

GILLETT-R-1987-B-BRIT-PSYCHOL-SOC-V40-P42  5 
GARVEY-WD-1979-COMMDNICATION-ESSENC  5 
GARFIELD-E-1990-CURRENT-CONTENT-0806-P5  5 
GARFIELD-E-1990-CDRRENT-CONTENT-0813-P5  5 
GARFIELD-E-1992-SCI-PUBL-POLICY-V19-P321  5 
GARFIELD-E-1970-NATURE-V227-P669  5 
GARFIELD-E-1964-SCIENCE-V144-P649  5 
GARFIELD-E-1992-THEORETICAL-MED-V13-P117  5 
FRIEDMAN-LM-1981-STANFORD-LAW-REV-V33-P773  5 
FLEISS-JL-1981-STATISTICAL-METHODS  5 
FEEH2ysr-PE-1987-LIBR-INFORM-SCI-RES-V9-P173  5 
F2URHOOMAND-A-1987-DATA-BASE-V18-P48  5 
FAIRTHORNE-RA-1969-J-DOC-V25-P319  5 
EVERETT- JE- 19  9 1- J-AM-SOC-INFORM-SCI-V4 2 -P4  0  5  5 

ENGLER-RL- 19  87-NEW-ENGL- J-MED-V3 17-P13  83  5 

EICHORN-P-1987-AM-J-PUBLIC-HEALTH-V77-P1011  5 
EGGBE-L-1990-INTRO-INFORMETRICS-Q  5 
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EGGHE-L- 19  92  -I1IFORM-PROCESS-M3UHAG-V2  8-P2  0 1 
EGGHE-L-1990-IKFORMETRICS-89-90-P97  5 
DREW-DE-1981-RES-HIGH-EDUC-V14-P305  5 
DHAWaN-SM-1980-J-DOC-V36-P24  5 
DAVIS-CH- 19  8  9 -SCIENTOMETRICS-V15-P2 15  5 

COLE-JR-1979-FAIR-SCI-WOMEN-SCI-C  5 
COLE-PJ-1917-SCI-PROGR-V11-P578  5 
COHEN-J-1988-STATISTICAL-POWER-AN  5 
CHEUNG- KPM- 19  9  0-SOC-WORK-RES-ABSTR-V2  6-P2  3  5 

CHEN-YS-1987-SCIENTOMETRICS-V11-P183  5 
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